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An accident waiting to happen 


The release of radioactive material at a US nuclear-waste repository reveals an all-too-—common 
picture of complacency over safety and a gradual downgrading of regulations. 


tory for nuclear waste dodged a bullet. Deep below the New 

Mexico desert, something went wrong. One or more drums 
of nuclear waste ruptured, probably because of a chemical reaction or 
explosion. Thousands of drums are held in the 655-metre deep under- 
ground repository, designed to safely contain for thousands of years 
the low- and medium-level radioactive remnants of US military pro- 
grammes. Just 15 years after it opened, the Waste Isolation Pilot Plant 
(WIPP) near Carlsbad has been hurriedly closed down while officials 
seek answers. 

Parts of the repository were contaminated with long-lived transuranic 
radioactive elements, including americium and plutonium. The extent 
of the contamination is still being established, but the amounts released 
were not small, and last week officials announced that the repository will 
remain closed for at least 18 months and possibly much longer. A small 
amount of radioactivity was also vented to the surface, and 21 workers 
were exposed to what seem to have been low levels. 

It is clear that both the accident and its consequences could have been 
much worse. Maintenance resulting from a separate and unrelated acci- 
dent on 5 February — a vehicle fire underground — meant that from 
6 to 10 February the ventilation was unfiltered, and real-time continuous 
radiation monitors were switched off. Had the accident happened then, 
rather than on 14 February, the release would only have been detected 
during manual radiation readings that are taken each morning, meaning 
that workers would unknowingly have been exposed, and higher levels 
of radioactivity would have reached the environment. 

On the evening of the accident, a continuous radiation monitor 
underground, which sounded the alert to high radiation levels in 
a waste-storage area at 11.14 p.m., was the only one in service, as all 
the others were out of order. This resulted in automatic switching of 
the ventilation to pass by high-efficiency particulate air (HEPA) fil- 
tration to catch radioactive particles. Shortly after the alert, a vigilant 
shift manager opened large fans to vent the repository contamination 
through the HEPA filters to the environment; this should have happened 
automatically with no need for manual intervention — but it had been 
switched to manual some years ago. The ventilation system also fell short 
of nuclear-safety norms, as it had gaps that allowed some radiation to 
reach the environment. Workers plugged these gaps with high-density 
foam on 6 March. 

The mantra for WIPP was to “start clean and stay clean”. Accidents, 
the government said, would never happen. But as a News article on page 
267 details, a Department of Energy (DOE) report on the incident out- 
lines how fanciful that promise was. The report describes an atmosphere 
of complacency. It lists a litany of failings, from an insidious continual 
deregulation of safety standards and cutting of corners, to dilapidated 
safety equipment, and a lax security culture. WIPP’s response to the 
accident itself was “delayed and ineffective” adds the report. 

The consequences ofa release of radioactivity at WIPP, a repository 


() nSt Valentine’s Day, the United States’ flagship geological reposi- 


for low- and medium-level waste deep underground in a remote region, 
are much less serious than those at a nuclear power plant. But as with the 
Fukushima nuclear power plant in Japan, the same characteristic errors 
were in play: hubris, overconfidence in safety assumptions, dilution or 
non-respect of safety standards, a weak security culture and, crucially, 
lack of tough, independent scientific and technical oversight. 

And, as at Fukushima, it took an accident to uncover glaring safety 
weaknesses and the lack of a strong safety 


“It took an culture — an essential element in safe nuclear 
accident operation. The DOE, which operates WIPP, 
to uncover and the WIPP regulators —including the 
glaring safety Environmental Protection Agency — seem to 
weaknesses have been asleep at the wheel. The uncover- 
and the lack of ing of these safety deficiencies is all the more 
astrong safety disconcerting given that the authorities have 
culture.” been proposing to expand WIPP froma site 


for low- and medium-level waste to one that 
would also hold both high-level surplus weapons-grade plutonium 
and much hotter spent nuclear fuel. 

In the past, WIPP was a model of how to integrate science into 
the planning and design of a nuclear-waste repository, and how to 
gain public confidence in that science. Its recent shortcomings are a 
further blow to the pressing need to find ways to deal safely with the 
vast quantities of accumulated defence and civilian wastes. WIPP and 
planned repositories elsewhere would do well to heed the lessons of 
WIPP’s troubles, and strive to ensure that transparent independent 
scientific oversight of projects is made a top priority and maintained. m 


Full support 


Germany should follow the United Kingdom’s 
lead and spell out the benefits of animalresearch. 


week, after research institutions came together to pledge greater 
public support for researchers who use animals in their work. 

The UK ‘concordat’ sets out how institutions that undertake animal 
research will publicize it. Signatories, which include major chari- 
ties, drug companies and universities, say that they will increase the 
amount of information they provide about what happens in their 
laboratories to inform the public about the value of animal research, 
and will report annually on how they are moving to greater openness. 
It is alaudable aim, and scientists in another European country must 
be wondering what they need to do to earn similar support. While the 


. cientists in the United Kingdom have reason to be grateful this 
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United Kingdom was putting the final touches to its concordat, six 
newspapers in Germany were running a full-page advertisement ques- 
tioning whether scientists who experiment on animals are even human. 

The advert opens with the quote: “Animal experimenters are a par- 
ticular type of creature — one should not casually call them human.’ 
It publishes a photograph of primate researcher Andreas Kreiter of the 
University of Bremen, a long-standing target of campaigners in the 
country, and describes him as a tormenter of animals whose research 
is without value. The advert closes with calls for citizens to treat all ani- 
mal experimenters with contempt and denounce their work publicly. 

Last week, the powerful Alliance of Science Organisations in Ger- 
many declared in a press statement that the lobby group that placed 
the adverts, Tierversuchsgegner Bundesrepublik Deutschland, had 
crossed acceptable boundaries. The alliance’s strong words represent 
a welcome change from its unhelpful default policy of keeping its head 
below the parapet. But German scientists deserve more. 

Now that it has broken its long silence over the use of animals in 
research, the alliance cannot retreat. It should follow the UK example 
and push for wider public awareness. Given the political weight of the 
institutions it represents — the Max Planck Society, the Leopoldina 
national academy, the universities and the Helmholtz Association 
among them — sucha stance could make a crucial difference. 

Scientists across Germany have been lobbying for nearly three years 
for the alliance to create a web resource for journalists and the public 
that makes available the true facts about research using animals. The 
Max Planck Society, which is taking the lead in a dragged-out effort 
to gather data about the value of such a resource, has doubts. But this 
should proceed as soon as possible. 

The Tierversuchsgegner’s advertising campaign may have been 
expressly designed to provoke a response, to keep the subject of ani- 
mal research in the media. That is all the more reason for the alliance 
to collate an accessible pool of information for the public. 

An immediate goal could be to prevent a recurrence of the advert, 


which ran in publications including the quality intellectual nationals Die 
Zeit and the Frankfurter Allgemeine Zeitung. What were they thinking? 
Germany takes the right of freedom of expression very seriously. But 
newspapers must balance this right with the first clause of Germany's 
1949 constitution, which states that the dignity of humans is inviola- 
ble. This was designed to ensure that a regime could never again label 

people ‘subhuman; and so unworthy of life, as the Nazis did. 
This is not the first time that such disturbing terminology has 
been levelled at science in Germany. At a public lecture in March, the 
award-winning novelist Sibylle Lewitscharoff 


“To make attacked reproductive medicine, and referred 
their points, to people born by artificial insemination as 
animal-rights ‘half-creatures. 

groups often The use of such aggressive language in 
lie or omit key debates about the ethical limits to medical 


research is worrying. When it comes to the use 
of animals in science, it underlines the impor- 
tance ofa proactive public stance. The most fiery animal-rights groups 
may be small, but they amplify their messages by appealing to people's 
emotions. To make their points, they often lie or omit key information 
about the tight regulation and oversight of animal experiments. Jour- 
nalists have no ready source of counter-information. Research agencies 
have been nervous of commenting openly, fearing that it might open 
more scientists to attack. Many medical charities avoid mentioning that 
they support research with animals for fear of putting off donors. 

In 2010, frustrated academic and industry researchers created the 
Basel Declaration, whose signatories commit to speaking publicly 
about their work and the value of experiments with animals. More 
than 2,300 individuals around the world have signed up — 431 of them 
in Germany — and 13 institutes and societies have given their sup- 
port. Still, it remains a relatively small effort, and relies on donations 
to cover its costs. The UK concordat represents a more powerful tool 
that other countries, Germany chief among them, should emulate. m 


information.” 


Hard data 


It has been no small feat for the Protein Data 
Bank to stay relevant for 100,000 structures. 


“to theorise before one has data.” Data are the lifeblood of 

science, the foundation of innovation. Behind every great 
discovery is a pile of data; but, crucially, it should not be too far 
behind. 

For more than four decades, the Protein Data Bank (PDB) has 
been where structural biologists keep their data close. Nearly every 
biology-publishing journal in the world, Nature included, requires 
protein structures to be deposited in the PDB before publication. 

So there was considerable worry at the database when Nature 
accepted a molecular map of HIV’s capsid protein shell last year 
(G. Zhao et al. Nature 497, 643-646; 2013). The multimillion-atom 
complex was larger than anything then in the PDB, and the database's 
team had to devise a way to make the data dump available (and useful) 
at short notice. 

Thus it goes at the PDB — whose trove surpasses 100,000 struc- 
tures this week (see page 265) — and other long-running archives 
that have managed to stay relevant and essential. It is not easy. Just 
ask the scientists, funders, technicians and others who shepherd 
them. 

Money is often the limiting factor. Computer storage and process- 
ing power may be getting cheap as chips, but much of the expense is 
in paying the people (many of them highly trained scientists) who 


S herlock Holmes understood: “It is a capital mistake,” he said, 
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organize and verify data entries, and engage scientific communities. 

There are many ways for a database to stay in the black. The 
three-decades-old GenBank, a clearing house for DNA sequences, 
is funded directly by the US government’s support of the National 
Center for Biotechnology Information (NCBI). By contrast, the 
50-year-old Cambridge Structural Database, which stores 700,000 
small-molecule structures, gets by on support from industry and 
around 1,300 institutes. 

The PDB is actually hosted by several organizations that provide 
access to the same data trove, each funded independently. Gerard 
Kleywegt, who heads the European franchise at the European Bioin- 
formatics Institute (EBI) in Hinxton, UK, says that healthy competi- 
tion between his portal and others in the United States and Japan helps 
him to get grants, and keeps the database pertinent. Scientists “vote 
with their mouse clicks’, he says. “They go to the place where they get 
the best answer for their questions.” 

In the 1970s, protein structures were consumed bya small com- 
munity of X-ray crystallographers interested in the nitty-gritty of indi- 
vidual enzymes. Now scientists use a range of techniques to determine 
structures, and researchers of many stripes want to know how proteins 
behave in a larger context, such as in a malignant cancer cell. A data- 
base must change with the times, or face extinction. 

The closure of a database is not so awful — as long as its useful 
information remains available elsewhere. In 2011, NCBI announced 
that it was mothballing a database that collected information about 
protein fragments used in proteomics experiments. A competing 
database run by the EBI has since swallowed 
up those data. But with 100,147 structures (as 
Nature went to press), and growing at about 200 
per week, the PDB, at least, shows no sign of 
folding. m 
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once black with passenger pigeons. Hunters, however, saw to 

it that the sky was clear of the birds by the second half of the 
nineteenth century. ‘Martha, the last individual of the species, expired 
in the Cincinnati Zoo in 1914. 

Writers have long elegized this vanished bird. The great conser- 
vationist-philosopher Aldo Leopold issued the most poignant trib- 
ute in his 1949 book A Sand County Almanac: “We grieve,’ he wrote, 
“because no living man will see again the onrushing phalanx of victori- 
ous birds, sweeping a path for spring across the March skies, chasing 
the defeated winter from all the woods and prairies of Wisconsin.’ 

But what if we could once again see those victorious birds sweeping 
their path across the March skies? 

Leopold could not have known that only a 
handful of decades after he wrote these words 
we would be on the verge of a scientific revolu- 
tion in efforts to reverse the death of species. The 
‘de-extinction movement — a prominent group 
of scientists, futurists and their allies — argues 
that we no longer have to accept the finality of 
extinction. By applying techniques such as clon- 
ing and genetic engineering, they believe that we 
can and should return lost species such as the pas- 
senger pigeon to the landscape. This is the goal 
of the San Francisco, California-based Long Now 
Foundation, which is actively supporting scientific 
efforts to recreate the lost bird within its “Revive 
& Restore’ project. But it does not stop there. Sci- 
entists in Spain say they are close to cloning the 
Pyrenean ibex, a mountain goat that took its last breath in 2000. Other 
species have also been targeted, including the Tasmanian tiger and even 
the woolly mammoth. 

The de-extinction lobby makes persuasive arguments. The most 
powerful among them appeal to our sense of justice: de-extinction is 
our opportunity to right past wrongs and to atone for our moral fail- 
ings. Advocates also point to the sense of wonder that the revival of 
extinct species could encourage among the public. Although we will 
always have passenger pigeons in museums and books, “book-pigeons,” 
Leopold lamented, “cannot dive out of a cloud to make the deer run 
for cover, nor clap their wings in thunderous applause of mast-laden 
woods.” De-extinctionists argue further that the revived species will 
restore lost ecological functions and enhance the diversity of ecosystems. 

At the same time, the de-extinction proposal raises considerable 
concerns. Resuscitated species could create problems in contempo- 
rary environments and for native species that 


Ts North American sky, according to historical accounts, was 


have evolved in the absence of the vanished DNATURE.COM 
biota. As with the introduction of any species _ Discuss this article 
into a new environment, there are risks of dis- _ online at: 

ease transmission and biological invasion.Some _go.llatlire.com/tyazw8 


THE ‘DE-EXTINCTION’ 
MOVEMENT 


ARGUES THAT 


WE NO LONGER HAVE 
TO ACCEPT THE 


FINALITY OF 
EXTINCTION. 


Is it right to reverse 
extinction? 


Several groups are working to bring back long-dead species, but these 
efforts could undo some hard-learned lessons, argues Ben Minteer. 


conservationists also express the fear that, given decades of ecological 
change and human development, the landscape won't be able to sup- 
port the revived populations. 

Others fret about the limited genetic diversity of any ‘de-extin- 
guished’ species and question the assumption that reviving a genome 
is the same thing as recovering the animal's behaviour and identity, 
which evolved over millennia. And there is also the particularly dis- 
tressing concern that such aggressive manipulation of wildlife might 
actually end up diminishing our desire (and our limited resources) to 
conserve extant species — and that it would entail harmful interfer- 
ence in the lives of animals. 

The most troubling aspect of de-extinction, however, is what it might 
mean for us. Attempting to revive lost species is 
in many ways a refusal to accept our moral and 
technological limits in nature. De-extinction 
thus reflects a new kind of Promethean spirit that 
attempts to leverage our boundless cleverness and 
powerful tools for conservation rather than for 
human enhancement. But things did not end very 
well for Prometheus. 

Leopold was aware of our tendency to let our 
gadgets get out in front of our ethics. “Our tools,” 
he cautioned in the late 1930s, “are better than we 
are, and grow better faster than we do. They suf- 
fice to crack the atom, to command the tides. But 
they do not suffice for the oldest task in human 
history: to live ona piece of land without spoiling 
it” The real challenge is to live more lightly on the 
land and to address the moral and cultural forces 
that drive unsustainable and ecologically destructive practices. 

That is why there is great virtue in keeping extinct species extinct. 
Meditation on their loss reminds us of our fallibility and our finitude. 
Weare a wickedly smart species, and occasionally a heroic and even 
exceptional one. But we area species that often becomes mesmerized 
by its own power. 

It would be silly to deny the reality of that power. But we should also 
cherish and protect the capacity of nature, including those parts of 
nature that are no longer with us, to teach us something profound about 
the value of collective self-restraint and human limits. Few things teach 
us this sort of earthly modesty any more. 

It cuts against the progressive aims of science to say it, but there can 
be wisdom in taking our foot off the gas, in resisting the impulse to 
further control and manipulate; to fix nature. 


Ben A. Minteer is a professor of environmental ethics at Arizona 
State University in Tempe. A longer version of this essay will appear in 
his forthcoming book (co-edited with Steve Pyne), After Preservation: 
Saving American Nature in the Age of Humans. 

e-mail: ben.minteer@asu.edu 
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Tractor beam 
pulls in objects 


Anarray of ultrasound beams 
can drag centimetre-sized 
objects towards it. 

Mike McDonald at the 
University of Dundee, 
UK, Gabriel Spalding at 
Illinois Wesleyan University 
in Bloomington and 
their colleagues sculpted 
interference patterns in the 
array so that much of the 
acoustic energy bounced off 
the sides or rear of an object 
in front of the array. This 
drove the object towards the 
ultrasound sources. The effect 
has been previously shown 
with light waves, but sound 
waves can move larger objects. 

Such control might prove 


useful in non-invasive surgery: 


for example, it could be used 
to manipulate drug-delivery 
packages inside the body or to 
precisely cut out tumours. 
Phys. Rev. Lett. 112, 174302 
(2014) 


How EI Niito slows 
the planet's spin 


The El Nino Pacific weather 
event affects how long the day 
is, but two types of El Nino do 
this in two different ways. 

Weather changes affect the 
planet’s rotation speed, and 
thus day length, by changing 
the atmosphere’s pressure 
over topographical features. A 
team led by Olivier de Viron, 
now at the University of La 
Rochelle in France, studied 
atmospheric behaviour 
between 1948 and 2013. 

The researchers found that 
when El Nifos make Pacific 
waters warmer in the east, 
they set up strong pressure 
gradients above big mountain 
ranges (such as the Andes) 
that increase the time it takes 


Injury shapes squid behaviour 


Squid that are sensitized to pain by injury are 
quicker to flee from predators, showing an 
adaptive benefit to injury and pain. 

Robyn Crook and Edgar Walters of the 
University of Texas Medical School at Houston 
and their colleagues took several squid 
(Doryteuthis pealeii; pictured) and inflicted a 
minor injury on one arm ofeach animal. When 
exposed to black sea bass, the previously injured 
squid fled or hid from these predators earlier 


the planet to spin by slightly 
more than 0.1 millisecond. 
By contrast, El Nifios with 
warmer central Pacific waters 
produce only about half as 
much Earth-changing drag. 
Geophys. Res. Lett. http://doi. 
org/snq (2014) 


Longlines better 
for deep seas 


Fishing with longlines has 

little effect on the vulnerable 
ecosystems of the deep sea, 
according to Telmo Morato and 
his team at the University of the 
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Azores in Horta, Portugal. 
Deep-sea fishing practices 
such as trawling have proved 
controversial owing to 
concerns about damage to 
slow-growing species at the 
bottom of the ocean. The 
researchers studied data from 
longline fishing, a technique 
that uses one main line with 
many shorter, hooked lines 
attached, around the Azores 
islands, and compared them to 
published data on the effects of 
bottom trawling. They estimate 
that between 4,000 and 23,000 
longline deployments would be 
needed to remove 90% of cold- 
water corals ina given area, 
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than uninjured animals. But squid that were 
treated with anaesthetics before the injury, and 
so did not develop neural sensitization, failed to 
change their behaviour. As a result, these animals 
were less likely to survive encounters with the 
predator than injured individuals that were not 
anaesthetized. This is the first experimental 
evidence that pain-like neural sensitization is an 
adaptive response to injury, the authors say. 

Curr. Biol. http://doi.org/sp8 (2014) 


ROGER T. HANLON 


compared with just 13 trawls. 
Regulated longline fishing 

could be a more sustainable 

method of deep-sea fishing 

than trawling, the authors 

suggest. 

Sci. Rep. 4, 4837 (2014) 


Detecting rainfall 
from the bottom up 


A method that allows 
researchers to estimate global 
rainfall levels using soil- 
moisture data could help to 


improve hazard planning for 
floods and landslides. 


NAWAZISH NAQVI 


DENNIS SHEBERLA 


To estimate rainfall in 
places that lack ground-based 
rain gauges, researchers 
rely on satellite data of 
atmospheric moisture, but 
this is notoriously inaccurate. 
Luca Brocca at the National 
Research Council in Perugia, 
Italy, and his colleagues 
developed an algorithm that 
calculates rainfall amounts on 
the basis of satellite data on 
soil moisture. They compared 
their estimates with rain-gauge 
data and found that their 
method accurately estimates 
rainfall in several regions 
around the world. 

Moreover, their algorithm 
is better than a state-of- 
the-art method at detecting 
light rainfall events and 
precipitation at high latitudes. 
J. Geophys. Res. Atmos. 
http://doi.org/sp7 (2014) 


Graphene analogue 
carries current 


A self-assembling polymer that 
forms thin films and conducts 
electricity could beat graphene 
as a candidate material for 
flexible electronics. 

Graphene, made of an 
atom-thick sheet of carbon, 
is flexible but cannot be 
used as a semiconductor in 
transistors because it lacks a 
‘band gap’. Mircea Dinca at 
the Massachusetts Institute 
of Technology in Cambridge 
and his colleagues mixed 
nickel with an organic 
compound called HITP and 
ammonia in water to produce 
a graphene-like structure with 
the important band gap. 

The ingredients self- 
assemble into a flat, 


honeycomb-like structure 
(pictured) that has excellent 
electrical conductivity, unlike 
most other self-assembled 
organic—inorganic systems. 
The team studied the material 
only in bulk form, but say 
that the results could be 

even better if the polymer 
was in two-dimensional 
sheets, perhaps leading to 
more efficient solar cells and 
supercapacitors. 

J. Am. Chem. Soc. http://doi.org/ 
spj (2014) 


When brown and 
polar bears split 


Polar bears evolved adaptations 
specific to the Arctic in fewer 
than 20,500 generations, and 
diverged from brown bears 
much more recently than is 
sometimes claimed. 

Rasmus Nielsen at the 
University of California, 
Berkeley, and his colleagues 
sequenced the genomes 
of 79 polar bears (Ursus 
maritimus) and 10 brown 
bears (Ursus arctos) and found 
that the two species diverged 
between 343,000 and 479,000 
years ago. 

Many of the genes under the 
greatest selection pressure in 
the polar bear are associated 
with the cardiovascular 
system. In particular, this 
bear seems to have evolved 
modifications in its vascular 
system that allow the animal to 
tolerate an extremely fatty diet 
made up mostly of blubbery 
seal meat. 

Cell http://doi.org/sp3 (2014) 
For a longer story on this research, 
see go.nature.com/zovyry 


ASTROPHYSICS 


Big planets could 
alter star rotation 


Massive planets with close-in 
orbits — also known as hot 
Jupiters — may influence the 
rotation and surface activity of 
their host stars. 

Katja Poppenhaeger and 
Scott Wolk at the Harvard- 
Smithsonian Center for 
Astrophysics in Cambridge, 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


SOCIAL SELECTION 


Maths reality check resonates online 


Biologists of all stripes are sharing an essay by Harvard 
University mathematician-turned-biologist Jeremy 
Gunawardena that makes a sobering observation: the 
mathematical equations at the core of many biological models 
fail to reflect nature. He argues that the components ofall 
quantitative models should be verifiable and, most of all, the 
conclusions should be falsifiable. Or, in his words: “Stick the 
model's neck out.’ Jason Moore, a geneticist at Dartmouth 
College in New Hampshire, tweeted: “This paper is so good 
Iam actually printing it out”— high praise in the paperless age. 


BMC Biol. 12, 29 (2014) 


Based on data from altmetric.com. 
Altmetric is supported by Macmillan 
Science and Education, which owns 
Nature Publishing Group. 


Massachusetts, analysed 

the emissions of binary- 

star systems, in which only 
one of the two stars in the 
system hosted an exoplanet. 
Comparing the differences 
between the emissions of the 
stars in each pair allowed 

the authors to measure the 
influence of the exoplanet on 
its host star. Using X-ray data 
from the Chandra and XMM- 
Newton space telescopes, 

the researchers found that 
the stars hosting hot Jupiters 
showed more magnetic 
activity than their planet-free 
companions. 

Magnetic activity increases 
with rotation, so the authors 
suggest that the gravitational 
influence of the hot Jupiters 
may have counteracted the 
natural slowing of their host 
stars’ spin over time. 

Astron. Astrophys. 565, L1 (2014) 


Thyroid makes 
young hearts grow 


A surge of thyroid hormone 
just before adolescence 
causes mouse hearts to grow 
drastically, suggesting that 
the organ may be easier to 
regenerate than previously 
thought. 

Ahsan Husain of Emory 


> NATURE.COM 
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University School of 
Medicine in Atlanta, Georgia, 
and Robert Graham of 

the Victor Chang Cardiac 
Research Institute in 
Sydney, Australia, and their 
colleagues labelled heart 
muscle cells of baby mice 
with a chemical. When the 
mice were 15 days old, the 
number of cardiomyocytes 
(pictured, red) increased by 
about 40%. 

It had previously been 
thought that cardiomyocytes 
stopped replicating just after 
birth. The findings suggest 
that giving thyroid hormone 
to babies with heart defects 
might help to repair the 
organ. 

Cell 157, 795-807 (2014) 
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SEVEN DAYS 


Baby-blood law 


Minnesota will once again 
allow blood spots collected 
from newborns to be kept 
and used for research, unless 
parents opt out. On 6 May, 
state governor Mark Dayton 
signed the controversial bill 
into law, reversing a 2011 
ruling by the state’s Supreme 
Court, which said that the 
practice violated state laws 
that require written, informed 
consent for the collection and 
storage of genetic information. 
The 2011 ruling allowed most 
blood spots to be stored for 
only 71 days to allow time for 
routine disease screening, and 
the state was forced to destroy 
more than 1 million samples. 
See go.nature.com/5ckmbm 
for more. 


Stanford axes coal 
Stanford University in 

Palo Alto, California, will 

no longer invest in coal- 
mining companies from its 
US$18.7-billion endowment 
fund, it announced on 

6 May. The move follows 

the recommendation of the 
university's advisory panel 
of students, staff and alumni, 
which reviewed the social and 
environmental implications 
of investing in fossil fuels. 
Stanford is the largest ofa 
number of US universities 
that have elected to remove 
fossil-fuel stock from their 
investments. The university 
said that the value of its 
investments in coal mining 
was “small”. 


On 8 May, Vermont became 
the first US state to mandate 
labelling of food containing 
genetically modified 
ingredients by July 2016. 
Representatives of the food 
and biotechnology industries 
condemned the law, and the 
US Grocery Manufacturers 


The news in brief 


FDA approves high-tech prosthetic arm 


US regulators have approved the first 
prosthetic arm that can perform complex 
movements by picking up on electrical signals 
sent to muscles by the brain. The US Food 
and Drug Administration (FDA) in Silver 
Spring, Maryland, gave the DEKA Arm 
System (pictured) the green light on 9 May. 
The device uses electrodes to detect electrical 


Association in Washington DC 
pledged to challenge it in 

the federal court. Vermont's 
attorney general said that he is 
prepared to launch a vigorous 
defence. More than 60 
countries require labelling of 
genetically modified foods. 


Misconduct verdict 
Haruko Obokata, a stem-cell 
scientist at Japan's RIKEN 
Center for Developmental 
Biology in Kobe, who was 
charged with research 
misconduct, has lost her 
appeal to have her case 
reviewed. The RIKEN institute 
confirmed on 8 May that it 
has advised Obokata to retract 
two papers she published 

in Nature describing a new 
method to reprogram cells 
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to an embryonic state. On 

7 May, an investigation 
committee advised RIKEN 

to deny Obokata’s appeal. See 
go.nature.com/rttlvk for more. 


Animal wrongs 
Analliance of ten 

leading German research 
organizations has spoken 

out against animal-rights 
activists who are targeting 
neuroscientist Andreas Kreiter 
at the University of Bremen. 
Kreiter conducts research 

on monkeys. On 7 May, the 
alliance said it “expressly and 
decisively condemns” an 
advertising campaign that 
personally attacks Kreiter and 
that suggests he, and other 
animal experimenters, should 
not be thought of as human. 
See page 259 and go.nature. 
com/Izphx5 for more. 
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activity caused by muscle contraction near the 
prosthesis. The arm enables some amputees 

to perform more-complex activities, such as 
using keys and locks and preparing food, than 
are possible with current prosthetic technology. 
The system was a test case for a fast-track FDA 
programme announced in 2011 to speed up 
approvals for medical technologies. 


Open doors 


Scientists who work with 
animals in the United 
Kingdom have pledged 

to be less guarded about 

their activities. On 14 May, 

72 organizations including 
universities, charities, drug 
companies and government 
funders released a ‘concordat’ 
committing to more openness 
than in the past. On 1 May, 
the UK government proposed 
jettisoning a rule that has 
prevented it from releasing 
much of the information it 
holds on animal research (see 
go.nature.com/Zzijvk2). 


US climate changes 
Climate change is already 
affecting the United States, 
warns a 6 May report from 

the nation’s government. 

The country’s third national 


DARPA 


PHOTOSHOT 


SOURCE: PDBE 


assessment of climate change 
impacts says that rising 
greenhouse-gas emissions 
have made US summers longer 
and winters shorter, and have 
upped the risk of extreme 
weather events. John Holdren, 
President Barack Obama's 
chief science adviser, called 
for “urgent action to combat 
the threats to Americans from 
climate change”. See go.nature. 
com/frmuuz for more. 


Biosafety law 


The German government 
should bring in a new law to 
regulate potentially dangerous 
bioscience research, advised 
a national ethics council in 

a report released on 7 May. 
Such dual-use research, 
which includes studies on 
lethal pathogens and toxins, 
should be subject to approval 
bya federal, interdisciplinary 
commission of experts, the 
council said. The report also 
recommended that German 
universities and research 
organizations should set up 

a national code of conduct. 
The German government 
commissioned the report two 
years ago. 


Beagle 2 leader dies 


British planetary scientist 
Colin Pillinger (pictured), 
best known for his role as 
lead scientist on the Beagle 2 
mission to Mars, died on 


TREND WATCH 


A digital compendium of 
proteins and other biomolecules 
has surpassed 100,000 entries, 
with the release of 219 new 
structures on 14 May. The 
Protein Data Bank (PDB) 

was started in 1971 to store 
three-dimensional structural 
data down to the atomic level. 
Then and now, scientists 
mapped most proteins using 
X-ray crystallography, but 
they are increasingly using 
other tools, such as nuclear 


magnetic resonance and electron 


microscopy. See also page 260. 


7 May, aged 70. Beagle 2 

lost touch with Earth after 
reaching the red planet 

on Christmas Day 2003, 

but the mission propelled 
Pillinger into the limelight 
and he quickly became an 
ambassador for UK space 
science. He started his career 
studying Apollo-mission 
lunar samples at NASA, then 
worked at the University 

of Cambridge, UK, before 
moving to Britain’s Open 
University in Milton Keynes, 
where he spent 35 years. 


NOAA chief 


Oceanographer Richard 
Spinrad is the new chief 
scientist at the US National 
Oceanic and Atmospheric 
Administration (NOAA). 
Spinrad, appointed by 
President Barack Obama 
on 8 May, is the first person 
to hold the job since 1996. 
Congress blocked Obama's 
first attempt to revive the chief 


scientist slot in 2010. Spinrad 
is no stranger to NOAA: from 
2005 to 2010 he served as 

an associate administrator 
there overseeing oceanic and 
atmospheric research. 


FACILITIES 


European lasers 


A third facility in eastern 
Europe's Extreme Light 
Infrastructure (ELI), a 
network that will allow 
scientists worldwide to 
probe the frontiers of laser 
science, received a funding 
green light on 8 May. The 
European Commission 
approved €111 million 
(US$153 million) from 

the European Regional 
Development Fund so that 
Hungary can build the ELI 
Attosecond Light Pulse 
Source near the University of 
Szeged. The fund, designed to 
help poor regions to improve 
their infrastructures, has 
already paid for the ELI’s 

two other pillars — the ELI 
Nuclear Physics facility 

near Bucharest and the 

ELI Beamlines facility near 
Prague. 


Lost at sea 

An US$8-million deep-sea- 
research craft belonging to the 
Woods Hole Oceanographic 
Institution in Massachusetts 
has been wrecked at sea. The 
unmanned vehicle Nereus was 
lost 9,990 metres under water 


ONE HUNDRED THOUSAND PROTEIN STRUCTURES 


Biomolecular structures stored in the Protein Data Bank are getting 


bigger and more complex. 
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SEVEN DAYS | THIS WEEK | 


13-16 MAY 

The United Nations 
holds its first meeting 
to address the problem 
of lethal autonomous 
weapons systems, or 
‘killer robots’ in Geneva, 
Switzerland. The 
meeting will include a 
debate between leading 
robotics experts. 
go.nature.com/lugvpj 


22-25 MAY 

Elections for the 
European parliament 
take place. Science 
issues that may playa 
part include support for 
stem-cell research and 
genetically modified 
crops. 
go.nature.com/lahzaj 


while exploring the Kermadec 
Trench off New Zealand on 

10 May. Crew members from 
the ship Thomas G. Thompson, 
who were operating Nereus, 
later recovered debris from the 
sea surface. The submersible 
seems to have imploded, the 
institute said in a statement. 
See go.nature.com/qiwrmd for 
more. 


| _BUSINESS 
Pharma exchange 


US pharmaceutical company 
Merck announced on 

6 May that it is selling its 
consumer care business, 
which includes over-the- 
counter pharmaceuticals, 

to Germany’s Bayer for 
US$14.2 billion. Merck 

will also pay Bayer at least 

$1 billion to share in the 
development, marketing 
and profits of a class of 
cardiovascular drugs 

called soluble guanylate 
cyclase inhibitors. Bayer is 
developing the drugs to treat 
heart failure and some forms 
of pulmonary hypertension. 
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Stored packages at the Waste Isolation Pilot Plant were inspected for signs of damage after an accident in February. 


NUCLEAR WASTE 


Call for better oversight of 
nuclear-waste storage 


Accident at US repository highlights need for tougher safety monitoring, say experts. 


BY DECLAN BUTLER 


serious accident in February at the 
A= States’ only deep-storage reposi- 

tory for nuclear waste might never have 
happened had the government not disbanded a 
key independent scientific body charged with 
oversight of the safety of the facility. 

The Waste Isolation Pilot Plant (WIPP), 
carved out of a salt bed 655 metres below the 
desert near Carlsbad in New Mexico, is run by 
the Department of Energy (DOE) and stores 


low- and medium-level military nuclear waste, 
containing long-lived, man-made elements 
such as plutonium and americium. But there 
are politically controversial plans to store far 
hotter high-level waste at the site. Nuclear- 
waste experts say that the accident — in which 
a container is thought to have ruptured or 
exploded — along with management errors 
and a lack of oversight at WIPP, highlight the 
need for an independent risk assessment of any 
proposed expansion. 

The facility was opened in 1999 and is 
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designed to operate for a few decades, after 
which it will be sealed forever. The accident on 
14 February released moderate levels of radio- 
activity into the repository, as well as small 
amounts into the environment, and officials 
say that the plant will not reopen for at least 18 
months. 

According to a preliminary report released 
on 24 April by a DOE-appointed Accident 
Investigation Board, the root cause of the acci- 
dent lies with the department's field office and 
Nuclear Waste Partnership, the contractor 
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> that operates the site, both in Carlsbad. 
They failed to identify radiological risks 
and make plans to control them, the report’s 
authors said. They added that maintenance of 
safety systems was neglected, and that DOE 
oversight was “ineffective”. 

The report’s findings are in sharp contrast to 
WIPP’s past record as a model of how to safely 
design and operate a deep geological waste 
repository. Many scientists attribute that rep- 
utation to the tough oversight provided until 
2004 by the Environmental Evaluation Group 
(EEG), a scientific body that was set up in 1978 
and charged with protecting public health and 
the environment. 

The EEG was staunchly independent of the 
DOE, and its technical expertise and author- 
ity were widely viewed as key to the public 
and political trust that the repository won. 
The Blue Ribbon Commission on America’s 
Nuclear Future, a government scientific advi- 
sory group, said in its 2012 report that the EEG 
“provided an independent and credible source” 
of information and review of WIPP. 

But in 2004, with WIPP by then fully opera- 
tional, the group was defunded and disbanded. 
Responsibility for oversight moved primarily 
to the New Mexico Environment Department 
in Santa Fe and the US Environmental Protec- 
tion Agency. “With the demise of EEG, inde- 
pendent technical and scientific oversight and 
transparency of WIPP has diminished,” says 
George Anastas, a radiation and nuclear-safety 
consultant, and a former EEG staff scientist. 

The accident report put into focus some of 
the decisions taken since the changeover. 

In 2006, for example, WIPP watered down 
a requirement that all waste containers have 
their contents analysed to characterize the type 
of waste they hold and to verify that they do 
not contain flammable, corrosive or reactive 
materials. This might have had a direct bearing 
on the accident. Although the ultimate cause 
has yet to be determined, an inspection on 
30 April ruled out a roof or wall collapse, and 
experts say that photographs showing evidence 
of heat damage in panel 7, where the accident 
occurred (see ‘Deep trouble’), are consistent 
with an explosion of one or more containers. 

And in 2009, WIPP eliminated 15 of the 
22 potential accidents it had previously been 
required to withstand, “without any clear jus- 
tification’, the report said. 

Under the EEG’s watch, the reduction in 
postulated accidents would not have hap- 
pened, says Lokesh Chaturvedi, an engineer- 
ing geologist who was deputy director of the 
EEG from 1982 to 2000. “I have no doubt in my 
mind that had EEG continued, the standard for 
inspecting the drums before shipping to WIPP 
would not have been diluted,” he adds. James 
Channell, an environmental engineer and 
health physicist who worked at the EEG for 
21 years, says that the accident and the report 
highlight the need to immediately reinstate an 
oversight body akin to the EEG. That body’s 
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DEEP TROUBLE 


The Waste Isolation Pilot Plant is carved out of a layer of salt that will eventually encapsulate the stored low- 
and medium-level nuclear waste. It consists of eight waste-disposal panels at the southern end, where the 
accident occurred, and a smaller experimental wing at the northern end. 


Experimental area includes labs jg 
studying waste science, mass of 
the neutrino and dark matter. 


@ Filled @ Active disposal @® Mining under way 
= ate 


“Re 
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first job should be to carry out an independent 
review of the accident, he says. 

Channell adds that proposals to expand 
WIPP’s remit from storing just defence-related 
low- and medium-level waste to include high- 
level waste also should not be approved with- 
out rigorous scientific evaluation. 

One such proposal has been floated by the 
US National Nuclear Security Administra- 
tion for the disposal of 34 tonnes of high- 
level, weapons-grade plutonium. A report 
released on 29 April by the agency concluded 
that storage at WIPP was the cheapest of sev- 

eral options, costing 


“Independent $8.8 billion. Other 
technical and plans aim to store 
scientific high-level spent fuel 
oversight and from nuclear power 
transparency plants, which is much 
of WIPP has hotter than the high- 


level military waste. 
The proposals stem 
from another repository problem: the govern- 
ment had originally planned to store high-level 
waste ina facility at Yucca Mountain in Nevada, 
but the project was mothballed in 2010. 
Research and scientific consensus on the 
safety of storing very hot high-level waste in salt 
beds are lacking. In particular, the effects ofhigh 
temperatures on the salt are not well character- 
ized. Heat might draw water out of salt crystals 
towards the waste, for example, potentially cre- 
ating danger from steam and pressure, says Don 
Hancock, director of nuclear-waste safety at the 
Southwest Research and Information Center, a 
watchdog group in Albuquerque, New Mexico. 
The DOE and other scientists argue, 


diminished.” 


© 2014 Macmillan Publishers Limited. All rights reserved 


— 


The waste-disposal panels | 
hold almost 90,000 cubic fF 
metres of waste. 


however, that such salt, heat and water inter- 
actions may be inconsequential, and that the 
heat might actually have the positive effect 
of driving moisture out of repositories. Heat 
also speeds up salt creep, which could help to 
encapsulate waste faster, but might also create 
operational problems. Only field experiments, 
some of which have started at WIPP, will be 
able to definitively demonstrate that salt is a 
safe medium for storing high-level waste that 
generates large amounts of heat, they say. 

Ed Lyman, a nuclear expert with the Union 
of Concerned Scientists in Washington DC, 
says that he strongly supports exploring the 
storage of down-blended weapons-grade plu- 
tonium at WIPP. Such waste generates much 
less heat than does spent fuel, he adds. But he 
rejects storing spent fuel at WIPP, as its likely 
impacts on the surrounding salt “would be 
inviting trouble”. 

The DOE Field Office in Carlsbad and the 
Nuclear Waste Partnership had not responded 
to Nature when this article went to press. 

Several scientists say that whatever the test 
results or arguments, the storage of high-level 
waste at WIPP should be ruled out because 
of the nature of the site. The area is rich in oil, 
gas and minerals, and oil and gas wells hug the 
41-square-kilometre area. Hydraulic fracturing 
— fracking — of gas is also carried out nearby. 
This poses the risk that the WIPP repository 
could be disturbed by future drilling and min- 
ing, for example, by the puncture of the high- 
pressure brine reservoirs beneath WIPP. 

There is no way that the authorities would 
ever approve such a site for storing high-level 
waste, says Chaturvedi. = 


SOURCE: WIPP 
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Microbiome therapy gains 
market traction 


Wave of investment suggests drugs from body- dwelling bacteria are heading for the clinic. 


BY SARA REARDON 


he human body teems with trillions 
Te microorganisms — a microbial 

landscape that has attracted roughly 
US$500 million in research spending since 
2008. Yet with a few exceptions, such as the 
use of faecal transplants for treating life- 
threatening gut infections or inflammatory 
bowel disease, research on the human micro- 
biome has produced few therapies. 

That is poised to change as large pharma- 
ceutical companies eye the medical potential 
of manipulating interactions between humans 
and the bacteria that live in or on the body. 

On 2 May, drug giant Pfizer announced plans 
to partner with Second Genome, a biotechnol- 
ogy firm in South San Francisco, California, to 
study the microbiomes of around 900 people, 
including those with metabolic disorders and 
a control group. “We are looking at using this 
as one piece of a puzzle to understand an indi- 
vidual,’ says Barbara Sosnowski, vice-president 
of external research and development at Pfizer 
in New York. A day earlier, Paris-based Enter- 
ome revealed that it had raised €10 million 
(US$13.8 million) in venture capital to develop 
tests that use the composition of gut bacteria 
to diagnose inflammatory and liver diseases. 

Experts predict that the next few months 
will see a boom in such partnerships and 
investments, and that new microbiome- 
derived drugs and therapies will come to mar- 
ket within a few years. 

Probiotics, or beneficial gut bacteria, have 
become a popular therapy in recent years. Tele- 
vision advertisements feature celebrities touting 
Bifidobacterium-laced yogurt, and consumers 
flock to buy pills that contain Lactobacillus to 
quell their gut disturbances and other ailments. 
But many physicians and scientists doubt the 


> 


MORE 
ONLINE 


Researchers are studying how gut bacteria such 
as Lactobacillus (grey) interact with the body. 


effectiveness of such remedies. “Probiotics may 
be relatively safe, but not particularly potent in 
terms of modifying diseases or symptoms,’ says 
Joseph Murray, a gastroenterologist at the Mayo 
Clinic in Rochester, Minnesota. 

But as scientists come to understand the 
mechanisms by which specific bacteria 
affect the body, many think that they can 
pinpoint the right combination of microbes 
to treat different conditions. Others aim to 
develop molecules that mimic a beneficial 
bacterium-—host interaction, or block a harm- 
ful one. “Undoubtedly, the microbiome is a 
little drug factory in our intestine,” says Justin 
Sonnenburg, a microbiologist at Stanford Uni- 
versity in Palo Alto, California. 

Murray’s group, for example, has reported 
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that feeding the gut bacterium Prevotella 
histicola to transgenic mice engineered to 
have human-like immune systems can sup- 
press the inflammation caused by multiple 
sclerosis and rheumatoid arthritis. His team 
is hoping to develop this into a therapy with 
biotech firm Miomics in New York. Similarly, 
Vedanta Biosciences in Boston, Massachusetts, 
is conducting preclinical trials of a pill contain- 
ing microbes that suppress gut inflammation 
(Y. Furusawa et al. Nature 504, 446-450; 2013). 
And last June, Second Genome announced 
a deal with Janssen Pharmaceuticals of Beerse, 
Belgium, to study the microbial populations 
of people with ulcerative colitis, in the hope 
of identifying new drugs and drug targets. 
Although Second Genome remains vague 
about the details of its products, president Peter 
DiLaura says that the company hopes to find 
small molecules and biological compounds 
suchas proteins that can tweak the microbiome 
to ease diabetes and autoimmune disorders. 
Meanwhile, one of Second Genome’s 
scientific consultants, bioengineer Michael 
Fischbach of the University of California, 
San Francisco, is developing tools to identify 
molecules found on bacteria or produced by 
them, and which bind to receptors on human 
cells and affect the immune or nervous 
systems. “It’s not just a drug-like molecule — 
it’s a real drug being produced,” he says. 
Changing the balance of ‘good’ and ‘bad’ 
bacteria in the gut microbiome can also influ- 
ence health — inflammation, for example, 
or even depression and anxiety. Researchers 
may already have a wealth of ready-made 
medications that can alter this equilibrium. 
Drugs and small molecules that have been 
discarded because they are not absorbed by 
the intestine may help to target the gut micro- 
biome specifically, treating it as an organ. 
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Sonnenburg’s team, for instance, has found 
that a compound called sialic acid builds up 
in the intestine and helps harmful bacteria 
to take over the gut when antibiotics have 
killed off helpful bacteria. The researchers 
are now investigating whether treating 
mice with compounds similar to sialic acid 
can inhibit this harmful transformation 
(K. M. Ng et al. Nature 502, 96-99; 2013). 
And Microbiome Therapeutics, a bio- 
technology company in Broomfield, Colo- 
rado, is currently conducting clinical trials 
with two small molecules that select for 
‘good’ gut bacteria to help people with dia- 
betes to take up insulin more easily. Chief 
executive Steven Orndorff says that the 
company plans to present the first results 
from the trials next month at an Endocrine 
Society conference in Chicago, Illinois. 
Other companies are turning the micro- 
biome into a diagnostic tool. Enterome has 
created a genetic-sequencing platform that 
detects changes in stool microbes that warn 
of the onset of disorders such as inflamma- 
tory bowel disease. The firm has tracked pro- 
gression of the disease in 100 such patients in 
a bid to avoid invasive colonoscopies. 
Getting microbiome-inspired therapies 
to market presents a number of challenges, 
however. Small molecules such as those 
developed by Microbiome Therapeutics may 
be able to go through the normal drug regu- 
latory pathway. But there may be a different 
or new set of regulatory hurdles for geneti- 
cally modified bacteria — for example, those 
in development by Ghent-based ActoGeniX 
in Belgium and ViThera Pharmaceuticals in 
Cambridge, Massachusetts — that deliver 
anti-inflammatory agents to the gut. Other 
issues, including intellectual-property rights 
for naturally occurring bacteria, may com- 
plicate the path of products to market. 
Although small start-up firms can be flex- 
ible in navigating these issues, funding and 
guidance from pharmaceutical giants can 
only help, says Bernat Olle, chief operating 
officer of Vedanta. 


In 2013, forexam- “The 
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New Brunswick,New intestine.” 
Jersey, to develop 
potential therapies for inflammatory bowel 
disease and other autoimmune disorders. 
Pierre Belichard, Enterome’s chief execu- 
tive, says that such investment has been a 
long time coming — but companies are 
now flocking to microbiome research. 
“Doctors have been asking questions about 
why this new and fascinating world of 
science is not seen as a place to put money 
in,’ he says. “Until the beginning of this 
year, that was a very good question.” Now, 
he says, investors “all want a microbiome 
company in their portfolio” = 
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Asensor-equipped mooring that measures the strength of the Atlantic Ocean’s overturning currents. 


OCEANOGRAPHY 


Atlantic current 
strength declines 


But more data are needed to indicate whether the slowing 
is a result of human-induced climate change. 


BY QUIRIN SCHIERMEIER 


he marked slowdown in the past decade 
Te the warm Atlantic Ocean currents 

that bring mild weather to northwest- 
ern Europe may be caused by natural variation 
and not anthropogenic climate change, as has 
been previously suggested. 

The Atlantic Meridional Overturning Cir- 
culation (AMOC) is part of the great ocean 
‘conveyor belt’ that ceaselessly circulates sea 
water, heat and nutrients around the globe. In 
particular, it transports large amounts of warm 
water from the tropics to the poles, warming 
the British Isles and maritime northern Europe 
along the way (see ‘Current affair’). But since 
2004, ocean sensors have detected a significant 
decline in the strength of the currents’ anda 
cooling of the subtropical Atlantic as a result’. 
From mid-2009 to mid-2010, for example, the 
circulation slowed to two-thirds of its usual 
strength — and some oceanographers sug- 
gested that the drop caused the harsh weather 
in the United Kingdom and western Europe 
that winter (see Nature 497, 167-168; 2013). 
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Climate scientists had speculated that the 
slowdown is linked to man-made climate 
change. But an analysis presented last month 
by a team of British scientists at the annual 
assembly of the European Geosciences Union 
in Vienna suggests that the AMOC’s slowing 
could just be part of natural oceanic fluctua- 
tions. The researchers added, however, that it 
will take more long-term monitoring to defini- 
tively rule out climate change as a factor. 

Scientists think that the AMOC might be 
subject to abrupt changes that have probably 
played a part in ancient climate events, such 
as the sudden temperature swings 18,000 to 
80,000 years ago during the last glacial period. 
The AMOC’s main engine — the sinking of 
cold, dense water to the bottom of the North 
Atlantic — has been identified as a potential 
‘tipping element’ in Earth's climate system, 
in which small climate perturbations could 
push the system past a critical threshold, with 
potentially large consequences for humans and 
ecosystems’, 

Since 2004, 22 moored sensors have been 
deployed between the Canary Islands and 


BEN MOAT 


CURRENT AFFAIR 

Monitoring with the RAPID Climate 
Change array has revealed that the 
strength of the Atlantic Meridional 
Overturning Circulation current, which 
transports warm surface water to the 
poles (orange) and cool deep water to 
the tropics (blue), is declining. 


Existing RAPID 
monitoring array. 


Florida along the latitude line at 26.5° north 
— where the AMOC emits its maximum 
heat. The sensor array, known as the RAPID 
Climate Change monitoring array, has contin- 
uously monitored the strength and tempera- 
ture of the current at different depths. 

RAPID measurements previously revealed! 
that the circulation weakened by 3% per year 
on average between 2004 and 2008, with a 
mean strength of 17.5 million cubic metres 
per second. Most of the past decade's observed 
decline occurred between April 2008 and 
March 2012, when the AMOC was around 
15% weaker on average than in the previous 
four years. The measurements also showed 
that the strength of the currents varied by up 
to 70% from year to year, depending on wind 
and seawater temperature. 

To find out whether the observed long-term 
decline lies within the range of natural yearly 
fluctuations, Chris Roberts, a climate scientist 
at the UK Met Office's 


Hadley Centre in “It could have 
Exeter who led the significant 
latest analysis, com- consequences 


pared the observed 
trend with estimates 
of circulation strength derived from 14 state- 
of-the-art climate-ocean models. If the vari- 
ability in modelled circulation strength were 
to differ substantially from observed trends, 
it could suggest that the decline is down to an 
external forcing factor such as climate change. 

Although the results suggested that the 
downward trend is extremely unusual, Rob- 
erts knew that models can substantially under- 
estimate the actual year-to-year variability 
in the strength of the AMOC. When he and 
his team adjusted the models to incorporate 
more-realistic natural fluctuations, the down- 
ward trend was statistically in line with the 
expected variations. Even if the slowing con- 
tinues at the current rate, the trend will not 
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summer 2014. 


differ significantly from plausible estimates of 
natural variability for 18 more years, the team 
concluded. But it will take at least 10 more 
years of continuous observation to detect any 
influence of man-made climate-change effects, 
says Roberts. 

“There’s nothing at the moment that would 
suggest that something dramatically worrying 
is going on,” says David Smeed, an oceanogra- 
pher at the UK National Oceanography Cen- 
tre in Southampton and a lead researcher in 
the RAPID programme. He suggests that the 
weakening of the AMOC could be because of 
the Atlantic Multidecadal Oscillation — a nat- 
ural cycle of ocean variability in which Atlantic 
temperatures dip every 60 to 70 years. 

RAPID, which was funded by the Natural 
Environment Research Council in Swindon, 
UK, was last year extended to run until 2020. 
Another array, funded mainly by UK and 
US science agencies, will be deployed this 
summer in the North Atlantic between Lab- 
rador, Greenland and Scotland to monitor the 
AMOC in subpolar regions. Together, data 
from the two arrays should help to explain the 
mechanisms behind the changes in circula- 
tion, says Susan Lozier, an oceanographer at 
Duke University in Durham, North Carolina, 
especially because the subpolar array is along 
a similar latitude to the main driver for the 
Atlantic Ocean circulation system. 

Regardless of the cause of the AMOC’s 
decline, if the trend persists “it could have sig- 
nificant consequences for society” in terms 
of the climate in northwestern Europe, says 
Roberts. Nevertheless, being able to predict the 
strength of the current could help to improve 
short-term regional climate forecasts, he says. m 


1. Smeed, D. et al. Ocean Sci. 10, 29-38 (2014). 

2. Cunningham, S. A. et al. Geophys. Res. Lett. 40, 
6202-6207 (2013). 

3. Lenton, T. M. etal. Proc. Nat! Acad. Sci. USA 105, 
1786-1793 (2008). 
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FETCH! 


NASA hopes to bring the first 
soil and rocks back from 
Mars. The process is set to 
begin in 2020, when the 
agency's next rover is slated 
to cache samples for return 
by two future missions, 


as-yet unplanned. 


Rover caches 
samples 


Cache 


Vehicle grabs 
orbiting samples 
and returns to Earth 


Second rover Y K 
fetches samples to 
be sent into orbit 


PLANETARY SCIENCE 


NASA plans Mars 
sample-return rover 


Agency to narrow down list of landing sites for 2020 mission. 


BY ALEXANDRA WITZE 


} | ASAss Curiosity rover is in the prime 
of its life, exploring the rocks, soil and 
air of Mars. But the agency is already 

planning its successor — and this time, the 

scientific stakes are higher. 

On 14 May, planetary geologists will gather 
in a hotel near Arlington, Virginia, to begin 
hammering out where NASA might send its 
next Mars rover, set to launch in 2020. The plan 
is to build a machine that is nearly identical to 
Curiosity, and equip it with fresh instruments 
to probe the Martian surface. 

Although NASA has yet to finalize details, 
the next rover will almost certainly have a 
hugely important, unprecedented job: to col- 
lect and store rocks and soil for a future space- 
craft to bring back to Earth. It would be the 
first ever sample return from Mars. 

“The next 20 years of Mars exploration hinges 
on where this rover goes,’ says Philip Chris- 
tensen, a planetary scientist at Arizona State 
University in Tempe. “It has to tell us something 
fundamental about the broader history of Mars” 

NASA%s workshop this week will discuss pos- 
sible landing sites. Many look familiar: they were 
on the longlist of sites for Curiosity’s landing in 
2012. Such locations include Mawrth Vallis, an 
ancient valley strewn with minerals formed in 
water, which would help with the rover’s main 
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goal of finding and exploring environments that 
could once have been suitable for life. The Euro- 
pean Space Agency is also considering the site 
for its ExoMars rover, which will launch in 2018 
(see Nature 508, 19-20; 2014). 

Other possibilities for 2020 include several 
ancient, now-dry lakes and deltas where flow- 
ing water once laid down sediment. These 
areas, including Eberswalde Crater, were 
among the top candidates for the Curiosity mis- 
sion. They were passed over in favour of Gale 
Crater, where the rover is laboriously trekking 
towards a 5-kilometre-high mountain of sedi- 
ments. Curiosity has yet to detect concentrated 
amounts of organic material, but the rich river- 
laid sediments in Eberswalde are likely to offer 
that bounty, says geologist Ross Irwin of the 
Smithsonian Institution in Washington DC. 

The 2020 rover will also have the crucial extra 
task of collecting samples. Scientists have talked 
for decades about getting their hands on Mar- 
tian rocks to look for signs of past life. They have 
studied meteorites that originated on Mars, but 
no space agency has yet been able to bring back 
samples directly, in part because of the cost and 
in part because of technical failures (see Nature 
479, 275-276; 2011). 

NASAs plan for bringing back Martian 
samples would involve a succession of mis- 
sions over many years (see ‘Fetch!’). Step 
one would need a rover to collect and store 
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roughly 30 narrow cylinders of rock and soil, 
either on board or on the ground. In step two, 
an unmanned rocket would fly to Mars and 
deploy another rover to fetch the samples and 
then blast them into orbit. Step three would 
be to capture that orbiting package and fly it 
back to Earth. 

Being able to look at a chunk of rock from 
a particular location and understand its 
context would be a crucial step forward, says 
John Mustard, a planetary geologist at Brown 
University in Providence, Rhode Island. “If 
samples were returned, the science that could 
come out of that would be equivalent to when 
the Apollo samples came back from the Moon,” 
he says. “It changes everything” 

Some scientists hope that the 2020 rover 
will visit new locations, whereas others want 
to return to sites explored by previous rovers 
such as Curiosity or Spirit. Both approaches 
have benefits, says Dawn Sumner, a geologist 
at the University of California, Davis. “With 
a new site there are more unknowns, but that 
means we are likely to learn more about the 
diversity of Mars as a planet,’ she says. 

Much depends on what instruments the 
rover will carry. Fifty-eight teams have submit- 
ted proposals; in July or later, NASA will select 
a handful of designs. Curiosity’s tools include 
an instrument that it deployed on 5 May to 
drill into a sandstone rock named Windjana. 
The rover has already drilled at two other sites, 
which yielded signs that an ancient lake bed 
once existed at Gale Crater. 

The destination of the 2020 rover will not be 
constrained by science alone: engineers must 
be able to manoeuvre the craft to the ground 
safely. They may use a variant of Curiosity’s 
‘sky crane’ descent, in which thrusters guided 
the rover to a precise location. 

This week’s workshop is the first step in 
narrowing down the list of landing sites, but 
the final decision might not be made until 
2019. “It will be a tremendously interesting and 
fascinating time,” says Matthew Golombek, a 
planetary geologist at the Jet Propulsion Labo- 
ratory in Pasadena, California, who is leading 
the site-selection process. “The most impor- 
tant thing for this spacecraft is not so much 
to learn about the rocks on Mars, but to learn 
enough to know if those rocks have the stuffin 
them that you want to bring back to Earth.” = 


CORRECTION 

In the News story ‘Doubts over heart stem- 
cell therapy’ (Nature 509, 15-16; 2014), 
the low-oxygen method for preparing 
mesenchymal stem cells was erroneously 
attributed to a Moscow research institute. 
The Moscow institute developed the 
experimental concept, but the low-oxygen 
refinement was developed in the United 
States. Mention of the institute has therefore 
been removed online. 


IMAGES OF EARTH AND MARS SURFACE: NASA 


n the morning of Saturday 12 April, 
C) ten police officers raided Maaygo, a 

men’s health and HIV/AIDS advo- 
cacy organization in a residential area of Kisumu in western Kenya. 
Staff watched helplessly as the officers confiscated information leaflets 
and even the model penis used in condom demonstrations. The police 
arrested the organization's director and finance officer, as well as one of 
its members, for “illegally practising sexual orientation information”. 

The detainees were released later that day after Daniel Onyango, 
director of a local homosexual-rights group, arrived at the police station 
and explained that it was not actually a crime to distribute information 
on sexual orientation. Maaygo will continue its operations — albeit from 
a less conspicuous location in Kisumu. 

An HIV treatment and research project in Uganda was not so lucky. 
On 3 April, the Makerere University Walter Reed Project in Kampala 
suspended its operations indefinitely after a staff member was arrested on 
charges of recruiting homosexuals and carrying out ‘unethical research. 

Stories such as these are becoming increasingly common across 
Africa, where vocal protests against homosexuality are on the rise. 
Many countries, including Ethiopia, Nigeria and Senegal, have stringent 
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Under siege 


A wave of anti-gay laws and homophobia 
in Africa is hampering efforts to study and 
curb the spread of HIV. 


BY LINDA NORDLING 


anti-homosexuality 

laws. In February, 
Uganda toughened its 
stance, passing harsh new measures that hand 
down life sentences to people convicted of having gay sex and that stipu- 
late up to seven years in jail for actively supporting the rights of gay people. 

Antipathy towards homosexuality has hampered efforts to curb HIV 
in these countries. The Joint United Nations Programme on HIV/AIDS 
(UNAIDS) has identified men who have sex with men (MSM) asa key 
risk group for HIV infection, but because of cultural prejudices, gay peo- 
ple in Africa are often unable to access information on how to protect 
themselves from HIV, and those who become infected are often denied 
treatment. Homophobia and criminalization are also impeding research 
on MSM and HIV transmission. 

“There are several examples where research has been stopped or 
slowed based on these laws,’ says Stefan Baral, an epidemiologist and 
physician at Johns Hopkins Bloomberg School of Public Health in 
Baltimore, Maryland. He has seen the problem at first hand, having 
conducted research with MSM in several African countries over the 
past decade. “What you end up with then is a data paradox, where you 


Kenyan activists in 
Nairobi protest against 
Uganda’s anti-gay law. 
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know the least in the places with the most stigma,” he says. 

Researchers and clinicians say that there are a few small signs of hope. 
In Kenya, for example, one clinic has managed to use education to curb 
anti-gay sentiments, providing a model that might be successful else- 
where. But that relative stability could easily crumble with changes in 
the local community or in the country’s leadership. 


FORBIDDEN RESEARCH 

The tide of homophobia has hit particularly hard in Uganda, says Paul 
Semugoma, a gay physician and researcher from that country who left 
two years ago and now lives in exile in South Africa. 

“Uganda is the worst, it’s a witch-hunt,” he says. A research project was 
supposed to start there this year to identify groups at risk of HIV, but it 
was suspended because of concerns for the safety of the researchers and 
participants, he says. When it comes to MSM, he adds, “what's going 
to happen is that there's not going to be any up-to-date information on 
this risk group”. 

Similar problems are plaguing research in Ethiopia, where same-sex 
encounters are punishable by up to 15 years in prison. Researchers are 
kept from studying MSM and HIV by the Ethiopian Public Health Insti- 
tute, which must approve medical research in the country. 

A programme run by the US Centers for Disease Control and Pre- 
vention and the Ethiopian Public Health Association managed to pass 
the screening process in 2011 because it used terms such as ‘most at-risk 
populations’ rather than MSM or gay, says an Ethiopian advocate for gay 
and transgender health and human rights, who lives in exile in the United 
States and asked not to be named because of concerns about the safety of 
his family and friends. Once the government found out that the project 
would target MSM and related groups, the research was stopped, he says. 

The problem is not limited to countries where homosexuality is a 
crime. “In many countries in sub-Saharan Africa, we just cannot do 
research on MSM-related topics,” says Lung Vu, an HIV and tuberculo- 
sis research adviser at Population Services International, a global-health 
organization in Washington DC. “Government officials and religious 
leaders just don’t allow for this to happen.” 

This is detrimental, he says, because data on MSM are needed to 
understand and combat HIV in Africa, where there is an epidemic of 
the disease. In an analysis’ last year, Vu and his colleagues found that 
the prevalence of HIV among MSM in three large Nigerian cities was 
between four and ten times that in the general population. Scraps of data 
from Senegal and other countries paint the same picture. 

Because there is so little information on MSM in Africa, researchers 
are only now starting to appreciate how much that population and other 
sexual minorities are spurring the spread of HIV on the continent, says 
Kent Klindera, director of an initiative at amfAR, the Foundation for 
AIDS Research in New York, that promotes HIV research and preven- 
tion for MSM and transgender individuals. 

Funding problems have also limited research in this area, says a 2013 
amfAR report’. Of the US$1.5 billion allocated to six African countries 
(Botswana, Malawi, Namibia, Swaziland, Zambia and Zimbabwe) by 
the Global Fund to fight AIDS, Tuberculosis and Malaria since 2001, 
only 0.07% went to research concerning MSM and transgender people. 
Four of the countries reported no dedicated MSM projects whatsoever. 

Despite the problems, researchers in Africa point to a few bright spots. 
For example, Mtwapa, a coastal town north of Mombasa, Kenya, man- 
aged to calm a volatile situation through community engagement and 
education about how the research can help society at large. 

The trouble at Mtwapa centred on an HIV clinic run by the Kenyan 
Medical Research Institute (KEMRI), which conducted risk-group stud- 
ies at the facility. On 12 February 2010, a mob of several hundred people 

charged the clinic, incited by two religious leaders 


> NATURE.COM —a Christian bishop and a Muslim imam. 

For a Nature special The riot was based on misinformation. “It 
onresearch in started with a rumour that two gay men were get- 
Africa, see: ting married in the town,” says Eduard Sanders, 
nature.com/africa an epidemiologist with the University of Oxford, 
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The problem isn’t 
limited to countries 
where homosexuality 
is acrime. 


UK, who has studied MSM in Mtwapa since 2005, and who witnessed 
the riot. “But when the mob couldnt find any hint of the wedding, it 
descended on the clinic because of its well-known research on MSM.” 
Armed with sticks, stones and other weapons, the crowd surrounded 
the clinic, demanding that the gay men come out. Police arrested people 
accused of being gay — possibly as a way of saving them from mob 
justice — and later released them. One KEMRI volunteer was severely 
beaten, according to the international group Human Rights Watch. 


ATTITUDE ADJUSTMENT 

Today, the clinic is safer, says Sanders, thanks to a campaign to inform 
local residents about its role in managing HIV in the area. As part of 
this effort, KEMRI hired community-liaison officer Evanson Gichuru, 
who met and interviewed local leaders and the people who organized 
the attack. Most of the antipathy, he observed, came down to misconcep- 
tions about gay people, such as the idea that they groom young boys to 
become prostitutes. Gichuru’s meetings, combined with a media cam- 
paign and health-worker training on MSM sensitivities, have brought 
a turnaround in attitudes. There have been no further attacks on the 
clinic, and research conducted after health-worker training found? that 
it significantly reduced homophobic tendencies, particularly among 
those who had previously scored a high rating on a homophobia scale. 

An important factor in changing attitudes was teaching people that 
the health of the wider community in Kenya depends on understanding 
and treating HIV among MSM, says Gichuru. Many men who have sex 
with men in Kenya also have sex with women, soa high HIV incidence 
in this group puts society in general at risk. 

But neither Sanders nor Gichuru wants to overstate the success in 
Mtwapa. “I don’t want to be overconfident,’ says Sanders. There is still 
a big stigma to being gay in Kenya, he adds. 

And the techniques used to raise awareness there are not likely to 
work in Uganda, says Semugoma. “In Uganda the homophobia is not 
only from the people, it’s also coming from the state,” he says. 

Researchers point to Malawi as another example of positive devel- 
opments. A few years ago, the Centre for the Development of People 
(CEDEP), a human-rights organization that supports minorities includ- 
ing MSM, had to scale down its activities after health workers were 
arrested. But in 2012, partly thanks to CEDEP’s advocacy, Malawian 
President Joyce Banda suspended the country’s laws criminalizing 
homosexuality. The arrests have stopped and CEDEP now has five 
offices across Malawi. 

The raid on Maaygo’s office in Kenya last month worried many 
physicians and researchers, but it also provided an opportunity. In 
subsequent meetings with local authorities and the police, people 
from the organization and other human-rights groups discussed how 
HIV treatment and support for MSM are needed to fight the high HIV 
infection rates in Kisumu, says Onyango, who attended the first such 
meeting. “Currently all is well. We have not had any further incidents,” 
he says. “Maaygo are still doing their activities, although at a slow pace 
so that we can sensitize the community.’ m 


Linda Nordling is a freelance writer in Cape Town, South Africa. 
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The left-over radiation from the Big 
Bang has given up what may be its last 
great secret about the early Universe, 
but astronomers are determined to mine 
more from this primordial prize. 


YEARS OF 
DISCOVERY 


Half a century after 
astronomers first 
detected the cosmic 
microwave background 
(CMB) radiation, it 
continues to be their 
clearest window on the 
early Universe. 


by Joanne Baker 


osmologists couldn't have wished for a better anniversary present. 
Almost 50 years to the day after the first detection of the Big Bang’s 
afterglow — a faint glimmer of long-wavelength photons known 
as the cosmic microwave background (CMB) — the field has been 
galvanized by what may be the last major discovery from the radiation. 

On 17 March, astronomers announced that a microwave detector at 
the South Pole had recorded the first signs of primordial ‘B modes’: sub- 
tle, swirling patterns in the CMB data that were imprinted during the 
early history of the Universe. The result was hailed as direct evidence of 
gravitational waves, ripples in the fabric of space-time that were produced 
by a sudden ‘inflation’ of the Universe a split second after the Big Bang. 

The most obvious question — is the B-mode signal real? — has 
sparked a race among teams running telescopes on the ground, in space 
and carried by balloons. “The name of the game is confirmation,’ says 
experimental cosmologist Amber Miller of Columbia University in New 
York City. 

Should the results check out, and most in the field think they will, the 
focus will shift to the next frontier. Scientists would like to see a new era 
of B-mode astronomy that would collect more extensive and more pre- 
cise measurements of the patterns. Through such data, researchers hope 
to reach back in time to better understand the Universe's first moments, 
as well as how galaxies formed and clustered together in the aftermath 
of the Big Bang. The B-mode data may even help to reveal the origins of 
mysterious factors such as dark matter and dark energy that control the 
form and fate of the cosmos. 

“The CMB has been our best window on the early Universe by along 
shot,’ says George Efstathiou, a cosmologist at the University of Cam- 
bridge, UK. But a rich new B-mode era is not guaranteed. Funding is 
scant, the existing surveys have little coordination with each other, and 
the available instruments are limited. What’s more, theorists still need 
to pin down exactly what the new views of the CMB can reveal. Even as 


Arno Penzias and Robert Wilson detect 
the CMB radiation and measure its 
temperature to be roughly 3 kelvin. 


1946-1948 

Several scientists predict that the Universe 
should be filled with remnant radiation from 
the Big Bang, and that this would have a 
temperature of just a few kelvin. 
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NASA's Cosmic Background Explorer (COBE) 
satellite measures the CMB from space and 
pins its temperature at 2.725 kelvin. 


1992 

COBE data reveal minuscule variations 
in the CMB's temperature, a sign of 
density fluctuations in the early Universe 
that would later condense into galaxies. 


1964, NASA; 1990, NASA/COBE SCIENCE TEAM; 1992, NASA 


1999, BOOMERANG/NASA/NSF; 2013, ESA/PLANCK COLLABORATION 


they celebrate this year’s discovery, CMB researchers are fretting over 
the future of their field. Decisions made over the next few months will 
determine whether astronomers can hope to realize the scientific prom- 
ise of this new vista in the next decade or more. 


EARLY DAYS 

The discovery of the Big Bang’s afterglow came as a happy accident, 
when Arno Penzias and Robert Wilson, two astronomers at Bell Labs 
in Holmdel, New Jersey, set out to map radio emissions from the Milky 
Way. On 20 May 1964, they noticed a faint signal that seemed to come 
from every direction. Penzias and Wilson assumed it was an artefact 
from some local source until a conversation with a colleague led them 
to conclude that the radiation was not earthly but cosmic. 

Theorists, they learned, had long predicted sucha signal: it was strong 
evidence in favour of the Big Bang theory, which holds that the Universe 
exploded into existence at a moment in the past, rather than having 
existed forever in an unchanging ‘steady state. By observing it, Penzias 
and Wilson had proved that the Universe was once much hotter than it is 
today. The photons they had recorded were released about 380,000 years 
after the Big Bang, when the expanding cosmic fireball had cooled 
enough for electrons and protons to combine to form hydrogen atoms. 
The photons have been travelling ever since, cooling as the Universe 
expands and preserving a snapshot of the Universe at the moment they 
were liberated (see ‘50 years of discovery’). 

In 1990, NASA’s Cosmic Background Explorer (COBE) satellite made 
the first precise measurement of the CMB’s temperature — 2.725 kelvin 
—and showed that the value was the same in every direction, implying 
that the primordial plasma was similarly uniform’. 

But it soon became apparent that the CMB is not perfectly smooth. 
In 1992, COBE scientists found that the temperature of the CMB varies 
across the sky by roughly 1 part in 100,000 (ref. 2). These tiny ‘anisotro- 
pies’ turned out to provide crucial information about the evolution of 
the Universe. The hot and cold spots reflect small variations in the den- 
sity of the gas when the CMB photons were released. Most cosmologists 
think that gravity later magnified these fluctuations, pulling together 
denser regions to form galaxies and clusters of galaxies. 

Seeing the anisotropies also inspired theorists, says Marc Kami- 
onkowski, a cosmologist at Johns Hopkins University in Baltimore, 
Maryland. A prime example was the recognition that the warm and cold 
blotches in the CMB have characteristic sizes determined by vast waves 
of pressure and density that reverberated through the infant cosmos in 
much the same way that sound harmonics echo inside a violin. These 
dominant frequencies, or acoustic peaks, in the CMB allow astronomers 
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to infer many physical properties of the Universe. For example, the big- 
gest peak, akin to the loudest harmonic, lies at a scale of about 1°, or 
about twice the diameter of the full Moon. This is exactly as would be 
expected if the expanding Universe is geometrically flat, so that parallel 
light rays never cross as they traverse space. The location and relative 
strength of the second peak, at roughly 0.4°, allows astronomers to infer 
that ordinary matter — the kind found in atoms, planets and stars — 
comprises less than 5% of the cosmic total. Everything else is in the form 
of invisible dark matter and dark energy. 


POLARIZATION PATTERNS 

CMB research entered a new phase a decade ago, with the advent of detec- 
tors sensitive enough to measure its polarization — the direction of vibra- 
tion in the photons coming from each point in the sky. Polarization in the 
CMB results from photons scattering off free-roaming electrons in the 
cosmic plasma, and the potential scientific pay-off from measuring it was 
huge: one component, the swirling B modes, promised to give astrono- 
mers the first direct evidence that the Universe had undergone an extreme 
form of inflation when it was just 10-*° to 10°” seconds old. Theorists 
proposed the idea in the early 1980s to explain why the Universe is both 
smooth at the largest scales and geometrically flat®. The rapid expansion, 
in which the cosmos grew bya factor of at least 10°, would have smoothed 
out most irregularities and flattened out any curvature. The few remain- 
ing irregularities — visible as the CMB temperature anisotropies — were 
vastly magnified vestiges of tiny quantum fluctuations in energy. 

But that was all theory until researchers developed the capacity to 
measure B modes. That required them to find a way to identify a minus- 
cule signal that is easily masked by polarized emissions from dust and 
magnetic fields in our Galaxy. The first detection wasn’t announced 
until 2013 (refs 4, 5) — and even then, the measurements were made 
ona small angular scale, at which polarization patterns are distorted by 
the gravitational fields of galaxies in front of the CMB. 

The real prize arrived in March, when astronomers working with the 
BICEP2 detector at the South Pole announced that they had measured 
B modes on scales of about 1°, large enough to avoid the signal from 
intervening galaxies and to probe fundamental polarization patterns 
such as those from inflationary gravitational waves’. 

After so many years of searching for inflationary B modes, BICEP2’s 
results triggered widespread elation in the cosmology community. “It has 
injected.a whole lot of adrenaline into the endeavour,’ says experimental 
cosmologist Shaul Hanany of the University of Minnesota in Minneapolis. 

But with that excitement came a puzzle. The patterns detected by 
BICEP2 are considerably stronger than most cosmological models 
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Balloon-borne detectors characterize CMB The BICEP2 experiment at the South Pole Next-generation CMB observatories could use 
fluctuations accurately enough for scientists to detects strong evidence of gravitational the radiation to track galaxy evolution and 
do a statistical analysis, which reveals information waves in the CMB’s polarization. probe the earliest instants of the Universe. 


on the Universe’s geometry and energy content. 


2003 

NASA's Wilkinson 
Microwave Anisotropy 
Probe (WMAP) charts the 
CMB in increased detail. 


2013 
Europe’s Planck satellite 


§ picks up first hints of 
y gravitational waves from 


the infant Universe. 
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predicted. And it exceeds limits set by the European Space Agency’s 
now-deactivated Planck satellite on the degree to which gravitational 
waves might have contributed to the CMB temperature fluctuations. 
“The BICEP2 result was a bit of a shock for me,” says Efstathiou, who 
is amember of the Planck science team. “I think the jury is still out” on 
what it means, or if it’s even real. 

In the next year, half a dozen experiments in Antarctica and Chile will 
try to confirm the findings. Members of the BICEP2 team are working 
on two new South Pole telescopes. The Keck Array, which is already 
operational, has five times as many detectors as BICEP2 and covers two 
frequency bands. A second, called BICEP3, is an upgraded version of the 
previous detector that is scheduled to start collecting data in December 
2014. One or two US-funded balloon-borne experiments may also fly 
later this year from McMurdo Station in Antarctica; last year’s flights 
were cancelled because of the US government shutdown. 

But cosmologists are most eager to see this autumn’s planned release 
of the full data set from Planck — findings that will include polariza- 
tion maps. Planck has the advantage of monitoring a wider range of 
frequencies than ground- and balloon-based experiments, which can 
measure only within the narrow bands of radiation frequencies that are 
not absorbed by water vapour in the atmosphere. 
Planck’s improved sight will give astronomers 


to move to new fields when the research programme ends. 

In the United States, CMB work falls into the cracks between grant- 
agency panels. Space and balloon work competes against planetary 
and X-ray astronomy missions at NASA, and ground-based arrays go 
up against particle-physics experiments at the Department of Energy 
(DOE) and the National Science Foundation (NSF). Philanthropic 
donations from the Keck and Simons foundations, among others, are 
helping fill the gap. 

A solution suggested by some astronomers would be to cut back on 
the number of ground-based CMB experiments with similar aims. Crit- 
ics contend that ground-based CMB experiments rarely share their data, 
which undermines calls for more projects. Most space-science missions 
are required to make their data public, says Jean-Loup Puget, an astron- 
omer at the University of Paris-South and a principal investigator on 
Planck. “Ground-based experiments should do so too,’ he says. 

But others argue that the ground experiments are cheap and that 
diversity makes for a healthy field. The one thing that everyone agrees 
on is that the science case for another CMB space mission is compelling. 
Efforts are afoot to make that happen after 2020. 

It may be an uphill battle. No CMB probe was ranked highly in 
NASA's 2010 Decadal Survey, a community- 
led review that sets scientific priorities to guide 
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will pose a challenge. To take just one example, 
gravitational waves as strong as those recorded 
by BICEP2 should have had a noticeable effect on the acoustic peaks 
— yet the limited Planck data currently available have shown no evi- 
dence for that. 

“How can you reconcile it?” wonders Efstathiou. The ideas put for- 
ward so far seem contrived, he says. Kamionkowski is more optimistic, 
and counsels patience. “It may take years to really understand what the 
most promising models are and how to distinguish them” 


NEXT-GENERATION EXPERIMENTS 
In the meantime, most CMB scientists are focusing on developing their 
capability for measuring B modes. For example, there are many theories 
for precisely how inflation unfolded, each making its specific predic- 
tion about how the gravitational wave B modes are distributed across 
the sky. Being able to measure B modes at the largest scales would allow 
astronomers to weed out the theories that are obviously wrong. 

At smaller scales, the B modes are sensitive to how mass is distributed 
around the Universe, and how vast galaxy clusters have grown over time. 
Sucha signal would help astronomers to constrain stubborn cosmological 
unknowns, including the nature of dark energy — a mysterious force that 
is causing the Universe's expansion to speed up — and the identity of the 
invisible dark-matter particles that make up most of the Universe’s mass. 

Combining B-mode maps with surveys of hydrogen across the Uni- 
verse could also allow observers to probe the epoch in which the first 
stars and galaxies switched on their ionizing radiation. Electron scatter- 
ing from this period should have left a large-scale mark in the B-mode 
polarization of the CMB. 

Unfortunately, limited money is constraining choices about what 
comes next. A UK competitor to BICEP2 was cancelled in 2009 (see 
Nature http://doi.org/fnsdc3; 2009) as its funding council struggled to 
meet commitments to international bodies such as CERN, Europe's 
particle-physics laboratory near Geneva in Switzerland. Europe as a 
whole, meanwhile, has focused its CMB research programme almost 
exclusively on Planck — a policy that Efstathiou describes as a “big mis- 
take”. With no follow-on mission in the pipeline and little ground-based 
activity, the concern is that hundreds of postdocs and students will have 
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from existing missions that have suffered delays. 

A consortium of US experimenters is proposing 
a follow-up to the present South Pole and Atacama telescopes. Known as 
CMB-S4, it would have hundreds of thousands of detectors and could 
come online after 2020 if it is given a high priority in a particle-physics 
review currently being carried out by the DOE and the NSF. Balloons 
could also play a part. “A coherent programme is warranted,’ Hanany says. 

In Europe, a higher-resolution successor to Planck has not so far 
been selected by the European Space Agency. A revised mission is being 
drawn up for the next round of proposals, and if successful might be 
launched in the mid-2020s. 

But such complicated proposals are expensive and difficult to realize, 
Efstathiou says. ‘Keep it simple’ is now his mantra. He would like to see 
asmall mission dedicated to observing B modes on large angular scales, 
thus targeting the gravitational-wave signature alone. In effect, it would 
be a BICEP2 experiment in space, says Peter Ade at Cardiff University, 
UK, who has built detectors for ground-based experiments and Planck. 
The technology is mature and he thinks such a mission could be ready 
in five years. 

A Japanese-led satellite proposal called LiteBIRD could be just such a 
mission. Proposed by physicists in Japan, in collaboration with experi- 
menters in the United States, Germany and Canada, the project could 
be launched in the early 2020s if it received some US$100 million in 
funding. In the meantime, the researchers are developing a ground- 
based experimental version, called GroundBIRD. 

For all the uncertainties about the future, CMB scientists are in bull- 
ish mood. “Nature has been kind to us in giving us this clear view of the 
early Universe,’ Efstathiou says. “With a gift like that we should exploit 
it as much as we can.” & 
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COMMENT 


NIH to balance sex in cell 
and animal studies 


Janine A. Clayton and Francis S. Collins unveil policies to ensure that preclinical 
research funded by the US National Institutes of Health considers females and males. 


ore than two decades ago, the 
US National Institutes of Health 
(NIH) established the Office of 


Research on Women’s Health (ORWH). 
At that time, the Congressional Caucus 
for Women’s Issues, women’s health advo- 
cacy groups and NIH scientists and leaders 
agreed that excluding women from clinical 
research was bad for women and bad for 
science. In 1993, the NIH Revitalization Act 
required the inclusion of women in NIH- 
funded clinical research. 

Today, just over half of NIH-funded 
clinical-research participants are women. 
We know much more about the role of sex 
and gender in medicine, such as that low- 
dose aspirin has different preventive effects 
in women and men, and that drugs such as 
zolpidem, used to treat insomnia, require 
different dosing in women and men. 

There has not been a corresponding revo- 
lution in experimental design and analyses in 
cell and animal research — despite multiple 
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calls to action’. Publications often continue to 
neglect sex-based considerations and analyses 
in preclinical studies*”. Reviewers, for the 
most part, are not attuned to this failure. The 
over-reliance on male animals and cells in 
preclinical research obscures key sex differ- 
ences that could guide clinical studies. And it 
might be harmful: women experience higher 
rates of adverse drug reactions than men do’, 
Furthermore, inadequate inclusion of female 
cells and animals in experiments and inade- 
quate analysis of data by sex may well contrib- 
ute to the troubling rise of irreproducibility 
in preclinical biomedical research, which the 
NIH is now actively working to address”*. 

The NIH plans to address the issue of 
sex and gender inclusion across biomedical 
research multi-dimen- 


sionally — through NATURE.COM 
programme oversight, Readabout NIH 
review and policy, — reproducibility 
as well as through _ policyat: 
collaboration with — go.nature.com/rerlef 
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stakeholders including publishers. This 
move is essential, potentially very powerful 
and need not be difficult or costly. 


BETTER WITH BOTH 
Certain rigorous studies evaluating the 
effects of sex differences have been effec- 
tive in bridging the divide between animal 
and human work. One example concerns 
multiple sclerosis (MS). Women are more 
susceptible to MS than men are, but develop 
less-severe forms of the disease. The most 
widely accepted MS animal model — rodent 
experimental autoimmune encephalomyelitis 
(EAE) — has revealed’ that sex differences in 
MSare related to both reproductive and non- 
reproductive factors. Findings’ that oestro- 
gen therapy provided benefits in rodent EAE 
supported use of an oestrogenic ligand as a 
candidate neuroprotective agent for MS that 
is now being studied. 

Moreover, differences between the sexes in 
both the animal model and human MS have 
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now been correlated with genetic factors. For 
example, some Y-chromosome genes (in male 
mice) seem to have a protective effect against 
the disease, and some X-chromosome genes 
(in female mice, with potentially double the 
dosage) have a disease-causing effect. Earlier 
this year, a study’ demonstrated that mice 
with XY chromosomes in the central nerv- 
ous system had greater neurodegeneration 
than did those with XX chromosomes. The 
findings have important implications for 
other sex-skewed neurological conditions, 
including Parkinson's disease, schizophrenia 
and stroke. Finally, inherited effects have been 
linked to imprinting of genes on sex and non- 
sex chromosomes (autosomes). Maternal 
parent-of-origin effects have been associated 
with MS risk”®. 

Substance abuse is also affected by sex. 
One target for intervention has been stress 
systems that mediate craving. Female rats 
exhibit a greater response to stress by the 
neurotransmitter norepinephrine than do 
male rats. A promising study" published this 
year provides the first evidence, in humans, of 
temporary attenuation of cocaine and alcohol 
craving, anxiety and negative emotion after 
stress in females — but not males — using 
guanfacine, which dampens the body’s 
nervous-system response to stress. 

Typically, reasons for male focus in 
animal-model selection centre on concerns 
about confounding contributions from the 
oestrous cycle. But for most applications, 
female mice tested throughout their hor- 
mone cycles display no more variability than 
males do, as confirmed in a meta- analysis”. 

Convention is another probable reason 
for reliance on the male-only models that 
have been typical in many research areas 
for decades. Lack of understanding about 
the potential magnitude of the effect of sex 
on the outcome being measured is likely to 
perpetuate this blind spot. 

The sex of cell lines studied in vitro is also 
too often ignored. Female and male cells 
respond differently to chemical and micro- 
bial stressors. These intrinsic differences are 
hormone-independent but also exhibit fur- 
ther variation on differentiation and expo- 
sure to sex hormones. It is well known that 
many neurological conditions are sexually 
dimorphic, and cell-culture studies have 
demonstrated that male (XY) and female 
(XX) neurons respond differently to vari- 
ous stimuli. Male neurons are more sensitive 
to stress from reactive oxygen species and 
excitatory neurotransmitters; female neu- 
rons are more sensitive to some stimuli that 
prompt the programmed cell death known 
as apoptosis’’. Data support distinct cell- 
death signalling in female and male neurons 
with potential applications in treatments for 
stroke, brain injury and other conditions. 

There are several approaches to rigor- 
ous preclinical research with a focus on sex 


and gender™. One, the four-core-genotypes 
model, can identify and distinguish between 
the effects of genes and the effects of 
hormones. The four genotypes in this model 
are XX gonadal males or females, and XY 
gonadal males or females. Using this model 
has, for instance, demonstrated influence of 
the sex-chromosome complement as a cause 
of sex differences in obesity and metabolism. 
Ona high-fat diet, mice with two X chromo- 
somes gained more weight than XY mice did, 
regardless of gonadal sex, and also developed 
a fatty liver and elevated lipid and insulin 

levels. These differ- 


“Convention ences are attributable 
is another to X-chromosome 
probable dosage rather than 
reason for to Y-chromosome 
reliance on effects”. 

the male-only Various organiza- 
models.” tions have taken steps 


to increase awareness 
and address unconscious bias about the 
importance of sex and gender in biomedi- 
cal research. Several journals now require 
authors to specify sex- and gender-related 
information. This includes stating the sex of 
animals used (or in the case of primary cells 
or cultures, the sex of the animal from which 
cells are derived) and that of human partici- 
pants in published studies. 


NIH STEPS 

The NIH is now developing policies that 
require applicants to report their plans for the 
balance of male and female cells and animals 
in preclinical studies in all future applications, 
unless sex-specific inclusion is unwarranted, 
based on rigorously defined exceptions. 
These policies will be rolled out in phases 
beginning in October 2014, with parallel 
changes in review activities and requirements. 
Because our goal is to transform how science 
is done, the first step will be the development 
and delivery of training modules and detailed 
policy informed by ongoing data analysis. As 
part of its initiative to enhance rigour, the 
NIH plans to disseminate training on experi- 
mental design for NIH staff, trainees and 
grantees. Evaluation of sex differences will be 
included in these modules. 

In 2013, the ORWH, which oversees the 
NIH-wide research agenda related to sex and 
gender influences, launched a programme 
that provides funding supplements to existing 
grants to add subjects, tissues or cells of the 
sex opposite to that used in the original grant, 
or to increase the power ofa study to analyse 
for a sex or gender difference by adding more 
subjects of either sex to a sample that already 
includes both males and females. Although 
this strategy enables the NIH to capitalize on 
the value of current research investments, we 
expect that such a mechanism will no longer 
be needed once policies on sex influences are 
implemented for preclinical research. 
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The ORWH will continue to work with 
the US Food and Drug Administration to 
co-fund the Specialized Centers of Research 
on Sex Differences programme, which sup- 
ports interdisciplinary collaborations on sex 
and gender influences in health, and bridges 
basic- and clinical-research approaches. This 
programme also facilitates training in sex and 
gender considerations in experimental design 
and analysis. The ORWH will leverage lessons 
learned from these centres. 

Reviewers of grant applications must also 
be brought to the table, because they pro- 
vide the first insights into taxpayer-funded 
research. The NIH review process will be 
modified in phases, and coordinated with 
requirements for applicants. Reviewers will 
be enjoined to evaluate applicants’ research 
plans to include, compare and contrast 
experimental findings in male and female 
animals and cells. 

Furthermore, the NIH will monitor com- 
pliance of sex and gender inclusion in preclin- 
ical research funded by the agency through 
data-mining techniques that are currently 
being developed and implemented. Impor- 
tantly, because the NIH cannot directly con- 
trol the publication of sex and gender analyses 
performed in NIH-funded research, we will 
continue to partner with publishers to pro- 
mote the publication of such research results. 

In requiring sex and gender inclusion 
plans in preclinical research, the NIH will 
ensure that the health of the United States 
is being served by supporting science that 
meets the highest standards of rigour. m 


Janine A. Clayton is director of the 

US National Institutes of Health Office 

of Research on Women’s Health, and 
associate director for research on women's 
health, Bethesda, Maryland, USA. 
Francis S. Collins is director of the 

US National Institutes of Health, Bethesda, 
Maryland, USA. 

e-mail: janine.clayton@nih.gov 
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An artist’s impression of the kind of social group in which early hum: 


SOCIOBIOLOGY 


ans may have lived. 


The distributed brain 


Herbert Gintis salutes the follow-up to a study on sociality and hominin brain size. 


ociobiology was born in 1975, when 
biologist Edward O. Wilson published 
the volume that gave the field its name. 
There are many social species, and we can 
gain insight into human sociality by compar- 
ing it with sociality in other animals. 
Several factors have propelled socio- 
biology to prominence in the behavioural 
sciences. Perhaps most important was the 
waning ofa major impediment: the idea that 
linking human behaviour to genetics fuels 
racist ideology. Furthermore, as researchers 
learned about the lives of birds, primates and 
insects, the concept of social structure as a 
general biological category arose. Finally, the 
value of interdisciplinary research such as 
sociobiology became apparent in many areas 
in which complex systems cannot be under- 
stood using the conventional field categories. 
Sociobiology in Homo sapiens is espe- 
cially complex because of the enormous part 
played by culture in human evolution, which 
is explored in gene—culture coevolutionary 
theory. In Thinking Big — a follow-up to the 
edited volume Social Brain, Distributed Mind 
(Oxford University Press, 2010) — Robin 
Dunbar, Clive Gamble and John Gowlett 
explore the growth of the brain from early 
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hominids to modern 
Homo sapiens, which 
has a ratio of brain to 
body mass three times 


Thinking Big: How 
the Evolution of 
Social Life Shaped 
the Human Mind 
CLIVE GAMBLE, JOHN 


that of other primates. coy err aN ROBIN 
Lar ge brains are very DUNBAR 

costly. Increased cra- Thames and Hudson: 

nial capacity required 2014. 


the restructuring of 

the human birth canal and led to birth before 
the fetus is fully matured. This in turn led to 
prolonged and collective child-rearing. In the 
average adult human, the brain represents 
about 2% of body weight, but consumes about 
20% of calories. What could the counter- 
balancing advantages of large brains be? 

The conventional answer has been skil- 
ful tool use. We now know, however, that 
hominin brain growth preceded by more 
than halfa million years the emergence of 
material culture — visual art, hafted tools, 
crafted containers and written language, all 
of which began to appear some 70,000 years 
ago. Richard Byrne and Andrew Whiten's 
Machiavellian Intelligence (Oxford Univer- 
sity Press, 1988) insightfully shifted the focus 
from technical to social skills, suggesting that 
a sharp wit conferred fitness by enabling 
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individuals to deceive and manipulate. In 
this view, the large brain is the product of an 
arms race that is a drain on species-level fit- 
ness. Given the intense competition among 
hominins to fill the hunter-gatherer niche, 
this theory of human hypercognition seems 
implausible: the energy wasted in mutual 
deception would reduce human fitness, 
favouring small-brained competitors. 

In Thinking Big, Dunbar, Gamble and 
Gowlett supply a more credible theory with 
their “social brain hypothesis”. They describe 
the major findings of the ambitious 7-year 
project ‘Lucy to Language: The Archaeol- 
ogy of the Social Brain, which involved more 
than 30 researchers and 5 UK universities, 
and was backed by the British Academy, the 
national funding body for humanities and 
social sciences. The authors show that there 
is a strong correlation between relative neo- 
cortex volume and mean social-group size in 
monkeys, apes and humans. They attribute 
this to the fact that the complexity of group 
interactions increases with group size. A 
large brain gives individuals the means to 
forge strong social ties that enhance their 
personal fitness and the group’s social 
cohesion; in particular, a large neocortex 


CCI ARCHIVES/SPL 


supports a “theory of mind”, whereby 
individuals can form mental representations 
of the beliefs and intentions of others. This 
enables them to enter into complex agree- 
ments and coalitions, and to track multiple 
social relationships through time and space. 

Social Brain, Distributed Mind is rather 
more detailed. It brings together an array 
of archaeologists, anthropologists, geog- 
raphers, psychologists, palaeontologists, 
historians and philosophers involved in 
Lucy to Language, who together construct 
a plausible “cognitive anthropology” that 
defends the social brain hypothesis, while 
exploring the idea that the human mind is 
not confined to individual brains, but lives 
in a social network of minds across which 
cognition is distributed. They thus handle 
the problem of brain growth long preceding 
the emergence of material culture. 

The contributors argue that long before 
material culture, early hominins developed 
a “material memory system” in the form of 
tokens and containers. This linked the men- 
tal power of many individuals, and led to a 
transition from knowledge acquisition and 
learning based on personal discovery to cog- 
nition based on social interaction and shar- 
ing. With distributed cognition, knowledge 
lies not only in the individual, but also in 
the social roles and iconic artefacts that link 
minds. For instance, humans have formed 
‘fission—fusior social groupings in which 
kin relationships are maintained across 
subgroups, both males and females migrate 
to marry, and complex, powerful, fitness- 
enhancing familial alliances are sustained. 

How plausible is the idea that we have big 
brains because we evolved to live in large 
groups, putting heavy cognitive demands 
on our ability to forge close social bonds 
with large numbers of individuals? My own 
view is that hunting required a high level of 
coordinated decision-making, and that the 
presence of lethal weapons undermined 
our ape ancestors’ characteristic social- 
dominance hierarchy, which was based on 
the physical prowess of the alpha male. This 
created a leadership void that could be filled 
not by appeal to physical strength, but rather 
by social persuasiveness and then a subtle 
ability to form effective coalitions. 

The result was a political structure in 
which linguistic facility and cognitive skills 
were rewarded with enhanced reproductive 
fitness. The social brain, then, helped our 
ancestors to operate successfully in a proto- 
democratic framework. = 


Herbert Gintis is external professor at 

the Santa Fe Institute in New Mexico and 
professor of economics at Central European 
University in Budapest. His most recent 
book is A Cooperative Species (with Samuel 
Bowles). 

e-mail: hgintis@comcast.net 


Books in brief 


The Butterfly Defect: How Globalization Creates Systemic Risks, 
and What to Do about It 

lan Goldin and Mike Mariathasan PRINCETON UNIVERSITY PRESS (2014) 
In anod to chaos theory’s butterfly effect — in which tiny 
perturbations unhinge big non-linear systems — this treatise 
explores globalization’s built-in risks. Economists lan Goldin and Mike 
Mariathasan analyse systemic vulnerabilities leading to cyber-attacks 
or pandemics, and look at the ecological risks integral to globalization. 
The sustainable management of such tangled interdependency, 

they argue, demands governance reform, including the setting up of 
research-led bodies to tackle big issues such as climate change. 


The Third Horseman: Climate Change and the Great Famine of the 
14th Century 

William Rosen VIKING (2014) 

A kink in Europe’s climate during the fourteenth century indirectly 
triggered a seven-year cataclysm that left 6 million dead, William 
Rosen reveals in this rich interweaving of agronomy, meteorology, 
economics and history. The Great Famine ended the explosion in 
agricultural productivity of the 400-year Medieval Warm Period, 
which affected mainly North Atlantic civilizations. Rosen deftly 
delineates the backstory and the perfect storm of heavy rains, hard 
winters, livestock epidemics and war leading to the catastrophe. 


_——— Birdmen: The Wright Brothers, Glenn Curtiss, and the Battle to 
Control the Skies 

Lawrence Goldstone BALLANTINE Books (2014) 

The daredevil scientists and engineers who forged the field of 
aeronautics spring vividly to life in Lawrence Goldstone’s history. 
Wilbur Wright is famed for cracking the conundrum of powered, 
controlled, heavier-than-air flight through leaps of intuition and 
reasoning. Less known is his and brother Orville’s feud with ace flyer 
and motor designer Glenn Curtiss. Goldstone never stints on the 
science in tracing the trio’s patent wars and struggles to monopolize 
the industry over a decade of dazzling innovation. 


The Remedy: Robert Koch, Arthur Conan Doyle, and the Quest to 
Cure Tuberculosis 

Thomas Goetz GOTHAM Books (201 4) 

What does germ theory have to do with evergreen fictional sleuth 
Sherlock Holmes? Science writer Thomas Goetz reveals all in this 
history of the hunt to cure tuberculosis (TB), centring on young 
physician Arthur Conan Doyle’s 1890 trip to Berlin to report on 
bacteriologist Robert Koch’s TB remedy, tuberculin. Conan Doyle 
rightly doubted its efficacy. But, impressed by Koch’s postulates that 
particular organisms cause diseases, he intensified his focus on the 
scientific method and the hunting of other insidious villains in fiction. 


Cold Blood: Adventures with Reptiles and Amphibians 

Richard Kerridge CHATTO AND WINDUS (2014) 

Nature writer Richard Kerridge fed, as a child, on accounts of black 
rhinoceroses, red river hogs and mandrills. His native Britain lacked 
such faunal glories — or so he thought, until he discovered the 
glistening hordes of amphibians and reptiles lurking in grass, bogs 
and leaf litter. In this mix of natural history, memoir and thoughts on 
the “cultural functions of wild animals for human beings”, captured 
moments such as the golden flash of a palmate newt delight the 
reader as much as they did Kerridge’s childhood self. Barbara Kiser 
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| COMMENT | BOOKS & ARTS 


Intoxicating science 


Jamie Goode drinks in two views of that most venerable and destructive drug — alcohol. 


cc ooze is civilization in a glass,” 
B states Adam Rogers in Proof. This 
science-steeped tale of human- 
ity’s 10,000-year love affair with alcohol is 
an engaging trawl through fermentation, 
distillation, perception of taste and smell, 
and the biological responses of humans 
to booze. Robert Dudley’s The Drunken 
Monkey, by contrast, focuses on the single 
question of why we drink — in many cases, 
to excess. 

Of the two, Proof is the easier read. Rog- 
ers, a senior editor at Wired magazine, 
reveals how alcohol is a spin-off from a 
form of warfare: yeasts use it as a chemi- 
cal weapon in their competition with other 
microbes. The sugar-rich environment of 
ripe fruits is a tantalizing food source for 
organisms ranging from bacteria to pri- 
mates. So much so that yeasts use the rela- 
tively inefficient process of fermentation to 
metabolize the sugar, because it produces 
the waste product ethanol, which poisons 
competitors. 

Humans first consciously exploited this 
around ten millennia ago: the oldest archae- 
ological evidence of alcohol production is a 
pot shard dated to that time, from Jiahu in 
China. For some 9,850 of these years, fer- 
mentation must have seemed a mysterious, 
even mystical transformation — until, in 
1857, French microbiologist Louis Pasteur 
revealed that yeasts are responsible. Even 
now, many aspects of booze — such as the 
link between soil characteristics and notice- 
able local flavours in wine — are yet to be 
fully explained. Exploring some of these 
unanswered questions is where Rogers has 
his fun. 

Along with yeast, Rogers looks at the 
varied sugar sources that fuel booze pro- 
duction: rice, grapes and barley. The latter 
is generally malted, a process in which it is 
allowed to germinate a little to initiate the 
conversion of starch to sugar. Here we meet 
the remarkable Jokichi Takamine, a Japa- 
nese chemist who worked intermittently 
in the United States, and who in the late 
nineteenth century devised a way to break 
down starch without malting. His technique 
could have revolutionized whisky produc- 
tion, but it was never developed commer- 
cially because it threatened the livelihoods 
of maltmen. Rogers 


then takes us through NATURE.COM 
fermentation and _ ForHaroldMcGeeon 
examines distillation, fermented food, see: 
first invented some _ go.iafure.com/ysoefj 
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Proof: The Science of Booze 
ADAM ROGERS 
Houghton Mifflin Harcourt: 2014. 


The Drunken Monkey: Why We Drink and 
Abuse Alcohol 

ROBERT DUDLEY 

University of California Press: 2014. 


Fermenting fruit has held allure for millennia. 


2,000 years ago. Steam distillation — the 
process of heating and cooling through 
which alcohol and water, which have dif- 
ferent boiling points, are separated — was 
probably invented in China, as evidenced 
by Han Dynasty bronze stills (a detail noted 
in The Drunken Monkey). Today most stills 
are copper, Rogers shows in Proof, because 
reactions with the metal get rid of the smelly 
volatile sulphur compounds produced by 
fermenting yeasts. Asa result, the still’s walls 
gradually thin, giving the vessel a lifespan of 
just 25 years or so. 

In discussing how alcoholic drinks are 
aged, Rogers pays particular attention to 
the role of barrels in producing whisky, 
bourbon and wine. Slow but steady expo- 
sure to low levels of oxygen, and the leach- 
ing from the oak of flavour compounds such 
as lactones and vanillin — which give notes 
of coconut and vanilla, respectively — are 
both important in shaping the flavour of 
these drinks. 

Proof is an entertaining, well researched 
piece of popular-science writing. Rogers 
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scrutinizes the role of smell and taste in wine 
tasting; this has proved a fertile ground for 
scientists interested in sensory perception, 
partly because of wine professionals’ highly 
detailed descriptions of how they perceive 
taste. There is some interesting discussion 
about the connection between flavour per- 
ception, the words we use to describe the 
flavours and the chemical composition of 
the drink. Sensory scientists use statistical 
techniques to link aromas perceived by tast- 
ers with chemical analysis of wines, but it can 
bea daunting task. Rogers also dips into the 
downside: how alcohol affects our physiol- 
ogy, and hangovers — perhaps caused in 
part by the accumulation of the breakdown 
product of alcohol, the fairly toxic molecule 
acetaldehyde. 

That downside is amply explored by 
Dudley in The Drunken Monkey. He, too, 
gives plenty of background on the history 
of drinking, the effects of alcohol on health 
and alcoholism. But the thrust of the book 
is an attempt to explore human alcohol use 
through the lens of evolution. He puts the 
roots of overindulgence in ancient primate 
tendencies to seek out ripe, sugar-rich fruits, 
which would often have some alcohol con- 
tent because of yeast activity. Dudley, an 
evolutionary biologist, hypothesizes that 
alcohol activates neural pathways that were 
once nutritionally useful, but now falsely 
signal reward after excess consumption. 
This is a well constructed, clearly written 
book, but the overall impression is that 
Dudley’s hypothesis, interesting as it is, 
needs more data points. Information on the 
blood alcohol levels of animals that have fed 
on fermenting fruit would be particularly 
welcome. 

Itis remarkable that, where not disallowed 
by religious beliefs, alcohol consumption has 
remained prevalent in so many human cul- 
tures, as both these books show. And what 
of the future? Most alcoholic drinks are 
still concocted using ancient techniques. 
Decades hence, will burgeoning scientific 
knowledge lead to new methodologies and 
novel forms of booze? Or will our descend- 
ants still be quaffing versions of the same old 
elixirs, magicked out of grain and grape by 
the time-honoured processes of fermenta- 
tion and distillation? m 


Jamie Goode is the author of The Science 
of Wine. He blogs at www.wineanorak.com 
and is based in London. 

e-mail: drjamiegoode@gmail.com 
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The Olfusa is Iceland’s largest river by volume. 


DEVELOPMENT 


Dammed dreams 


Monya Baker is swept along by a documentary film 
tracing humanity’s complex relationship with water. 


nvironmental degradation has never 
B= so gorgeous. Over its 92 minutes, 

the documentary film Watermark 
deluges viewers with astonishing images of 
dry riverbeds, bizarre irrigation schemes, 
dams and city-scale aquaculture. The work 
of photographer and co-director Edward 
Burtynsky, co-director Jennifer Baichwal 
and cinematographer Nick de Pencier, the 
film is the fruit of a five-year odyssey in 
which Burtynsky recorded how water has 
shaped humanity, and vice versa. Shooting 
in ultra-high-definition video, he and his 
team specialized in aerial shots. They hoisted 
cameras up flagpole-high winches, and shot 
through the open floor of a helicopter and 
from airborne drones with cameras flying on 
a gyro-stabilized mount. 

The documentary covers some 20 stories 
across 10 countries, filming everywhere 
from within a glacier to far above Earth. It 
shows how water is studied, diverted and 
polluted for profit and for pleasure, with 
eye-popping effects on communities and 


landscapes. Thisisno Watermark 
call to action, how- DIRECTED BY EDWARD 
BURTYNSKY AND 


ever: Burtynsky says 
that his images could 
fit as well on the cover 
of a mining company’s corporate report as 
in an environmental fund-raising campaign. 
But the images do carry a message. 

Most start as mysteries. The film begins 
wordlessly, with storeys-high sepia splashes 
that finally resolve into a desilting project 
at the Xiaolangdi Dam on China's Yellow 
River. This cuts quickly to acres of baked, 
cracked mud: the ghost of a Mexican river 
that no longer meets the sea. “Once, the river 
was beautiful,” a small, wrinkled woman, a 
former local resident, says in Spanish. As 
if afraid that she may be remembering a 
dream, Inocencia Gonzalez Sainz recounts 
how bountiful fish were in the Colorado 
River Delta, now a desert because of dams 
— including the Hoover Dam, less than an 
hour's drive from Las Vegas, Nevada. 

A shot of what appears to be the night sky 


JENNIFER BAICHWAL 
2013; in cinemas now. 
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comes slowly into focus, gradually revealed 
to be stacks of gleaming metal cylinders — 
hand-annotated in black and filled with ice 
cores drilled from deep within a Green- 
land glacier. They are also core to the film’s 
scientific content. Jorgen Peder Steffensen 
and Dorthe Dahl-Jensen, who study the 
cores, explain what they reveal about 
ancient climate, and how easily climate can 
flip between stable states, some inhospitable 
to humans. Infusing water with awe, they 
tell how Earth’s oceans were probably deliv- 
ered by comets. 

Burtynsky captures large-scale water- 
control efforts, both ancient and modern. In 
the steeply terraced rice paddies of China's 
rural Yunnan province, a teenager sporting 
a pink-sequined bowler hat spends his days 
in solitary walks as a “water guard’, making 
sure that no one shifts the carved logs that 
allocate streamflow to his and other families’ 
fields. Time-lapse photography of the con- 
struction of the Xiluodu Dam in Yunnan 
shows a month’s worth of water turning land 
into reservoir; the footage then slows, zoom- 
ing in ona spider that has futilely climbed to 
the top of an island of debris, its searching 
feet finding water on all sides. 

Burtynsky’s lens does not neglect the 
developed world. One sequence starts inside 
a chlorinated swimming pool. As the shot 
pans up we see that the pool is a huge tub 
of concrete that is itself sitting in water on 
a reconstructed “waterfront” in California. 
The curlicued streets are dotted with identi- 
cal houses, many with a pool jutting into a 
river delta. The excess is at once disgusting 
and beautiful. 

In a sequence on farming, a helicopter 
flies over a disturbingly artificial landscape 
of green and brown circles in Texas. They 
are created by long, rotating irrigation pipes 
that drain the Ogallala Aquifer beneath 
faster than it can be refilled. A helicopter 
pilot tells Burtynsky that a volume greater 
than Lake Erie has already been consumed. 

Water can be trashed as well as taken. 
At a thirsty tannery in Dhaka, preternatu- 
rally blue waste water pours untreated into 
the Buriganga River. A chemist at the plant 
dispassionately describes the toxin-laced 
tanning process. In one scene, sari-clad 
women wearing black rubber gloves pack up 
piles of hide; they stand on the chromium- 
laced scraps in bare feet. 

Ina metaphysical but scientifically accu- 
rate discourse, an indigenous Canadian 
explains water as the stuff of spiritual connec- 
tion. From a canoe in the middle of a boreal 
lake in British Columbia, he reminds us of a 
unique web of kinship in the Stikine River 
valley. Every living thing in it is composed 
mainly of a shared substance — water. = 


Monya Baker is acting Comment editor for 
Nature in San Francisco, California. 
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Tamiflu reviewers 
respond to critics 


As authors of the Cochrane 
review that questions the 
stockpiling of the antiviral 
drugs Tamiflu (oseltamivir) and 
Relenza (zanamivir) against 
influenza pandemics, we wish 
to clarify aspects of your report 
on criticisms of the review (see 
Nature 508, 439-440; 2014). 

We agree that the randomized 
clinical trials of Tamiflu were 
“not designed to test for the 
severe outcomes”. But far from 
undermining our review, this is 
actually one of our important 
findings. This is because, for 
years, governmental bodies 
justified stockpiling Tamiflu 
(see go.nature.com/ucyjwb 
and go.nature.com/oi9zbg) on 
the basis ofa short analysis of 
ten pooled randomized trials 
(L. Kaiser et al. Arch. Intern. Med. 
163, 1667-1672; 2003).That 
study was authored by researchers 
at Roche, the manufacturer 
of Tamiflu, and concludes 
that the drug significantly 
reduces complications and 
hospitalizations in healthy and 
at-risk adults. 

Our Cochrane review, by 
contrast, independently evaluated 
data from the full, previously 
confidential, trial-evidence base 
— something that officials should 
have done themselves. Critics 
of our research miss the point 
about what our findings say about 
government accountability. 

You incorrectly refer to 
the randomized trials as 
“small”, which would call 
the generalizability of the 
conclusions into question. In 
fact, trial M76001 had more than 
1,400 participants, and the two 
pivotal studies (WV15670 and 
WV 15671) each had more than 
600 participants. You also omit to 
mention that the trials enrolled 
at-risk as well as healthy subjects. 

Your report cites an 
observational study in which 
neuraminidase inhibitors (the 
drug class to which Tamiflu 
and Relenza belong) reduced 
mortality in hospitalized patients 


during the H1N1 influenza 
outbreak in 2009-10, apparently 
aligning with criticisms of our 
review for not including such 
observational studies. However, 
you omitted to mention the 
limitations of that study — or that 
it was funded by Roche. 

We stand by our conclusion 
that government decisions to 
stockpile Tamiflu should be 
backed by high-quality evidence 
of safety and effectiveness. 
Peter Doshi University of 
Maryland School of Pharmacy, 
Baltimore, Maryland, USA. 
pdoshi@rx.umaryland.edu 
Tom Jefferson The Cochrane 
Collaboration, Rome, Italy. 

The authors declare competing 
financial interests: see go.nature. 
com/wudyco for details. 


Prion identity 
wrongly credited 


The review of Stanley Prusiner’s 
autobiography (G. Mallucci 
Nature 508, 180-181; 2014) 
suggests that the idea of an 
infectious protein was first put 
forward by Tikvah Alper and 
colleagues (Nature 214, 764- 
766; 1967) and by John Stanley 
Griffith (Nature 215, 1043- 
1044; 1967). This perpetuates a 
common myth. 

Alper concluded from 
radiation-inactivation data that 
the agent that causes scrapie, 

a neurodegenerative sheep 
disease, does not depend on 
either a nucleic acid or a protein 
to replicate, favouring an earlier 
suggestion that it might be a 
replicating polysaccharide. 

Griffith opens his paper by 
crediting the idea that the scrapie 
agent is a protein to an earlier 
paper by Alper and colleagues 
(T. Alper et al. Biophys. Biochem. 
Res. Commun. 22, 278-284; 
1966), and also to I. H. Pattison 
and K. M. Jones (Vet. Rec. 80, 
2-9; 1967). In fact, this earlier 
Alper paper does not contain the 
word ‘proteim. Griffith’s second 
claim is correct. Pattison and 
Jones made their suggestion 
because the techniques they used 


288 | NATURE | VOL 509 | 15 MAY 2014 


to purify the scrapie agent were 
the same as those used to purify 
small basic proteins. 

This myth probably persists 
because the key 1967 papers 
are not freely accessible online, 
making it harder for today’s 
busy scientists to check their 
facts. 

R. John Ellis University of 
Warwick, Coventry, UK. 
r.j.ellis@warwick.ac.uk 


Call for UN to act 
on food security 


The latest report from the 
Intergovernmental Panel 

on Climate Change (IPCC) 
indicates that the rise in 
greenhouse-gas emissions is 
affecting food production, 
particularly in poor tropical 
regions (see go.nature.com/ 
afvyfg). As director of the 
CGIAR Research Program on 
Climate Change, Agriculture 
and Food Security, I call for 
next month’s session of the 
United Nations Framework 
Convention on Climate Change 
to act urgently on these findings 
(see go.nature.com/Irwfnw). 
Climate-change adaptation must 
become the priority for policy- 
makers around the world. 

The UN Food and 
Agricultural Organization 
(FAO) has confirmed that food 
prices are rising sharply (see 
go.nature.com/yavdzo). Recent 
geopolitical tensions, such as the 
ongoing situation in Ukraine, 
are partly to blame, but unusual 
adverse weather conditions are a 
main culprit. 

Extreme climate events 
such as floods, tornadoes and 
droughts are becoming more 
frequent. Yields of wheat 
and maize (corn) are falling; 
warming oceans are wreaking 
havoc on fish harvests; and 
rising sea levels threaten to wash 
away fertile coastal regions. As 
the FAO report shows, these 
factors are increasing global 
food insecurity. 

Governments have been 
too slow to react. Research 
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and innovation should start 
now because it can take up to 
20 years to see results. The UN 
must stop procrastinating on 
adaptation funding, and use 
the IPCC and FAO reports as 
an impetus for action against 
fractured food production (see 
also T. MacMillan and 

T. G. Benton Nature 509, 25-27; 
2014). 

Bruce Campbell CGIAR, 
Copenhagen, Denmark. 
b.campbell@cgiar.org 


Ocean pollution foils 
search for plane 


An international search to locate 
missing Malaysian Airlines flight 
MH370, which disappeared on 
8 March, is under way in the 
southern Indian Ocean. Various 
objects seen floating in the ocean 
and washed up on the shores of 
western Australia briefly raised 
hopes that traces of the plane had 
been found. Unfortunately, such 
litter is ubiquitous in the oceans. 
Finding traces of humanity 
in the sea has never been 
easier, thanks to sophisticated 
technology. But finding 
evidence of the plane’s 
whereabouts is proving much 
more difficult — despite 
numerous and ongoing research 
successes with the marine- 
observation systems employed 
(see www.ioc-goos.org). 
Perhaps the tragedy of the 
false litter trail of flight MH370 
will help to raise the public's 
awareness of the need to protect 
the oceans from pollution (see 
www.gpa.unep.org). 
Keith Alverson United Nations 
Environment Program, Nairobi, 
Kenya. 
keith.alverson@unep.org 
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Shifting storms 


An analysis of historical storm data reveals that the average latitude at which tropical cyclones attain their maximum 
intensity has undergone a pronounced shift towards the poles over the past three decades. SEE LETTER P.349 


HAMISH RAMSAY 


onsiderable attention has been devoted 

to the regional and global effects of cli- 

mate variability and climate change on 
the behaviour of tropical cyclones over the 
past decade or so. Catastrophic events such 
as Hurricane Katrina (2005), Cyclone Nargis 
(2008), Hurricane Sandy (2012) and Typhoon 
Haiyan (2013) have led scientists and non- 
scientists alike to ask how climate change is 
affecting the intensity, frequency and location 
of tropical cyclones around the globe. There 
is a general consensus among experts that 
anthropogenic warming will lead to fewer, 
but more intense, tropical cyclones’. However, 
little attention has been paid to understanding 
long-term shifts in the geographical location of 
these cyclones, particularly when at their peak 
intensities (Fig. 1). 

On page 349 of this issue, Kossin and co- 
authors’ shed light on this aspect by examining 
trends in the latitude at which the maximum 
intensities of storms occur, a metric referred 
to in their study as the lifetime-maximum 


intensity (LMI). Their findings reveal a pro- 
nounced migration of the annual-mean LMI 
towards the poles over the past 30 years, 
at a rate of about 1° of latitude per decade, 
although this metric varies considerably on 
regional scales. If this poleward migration of 
tropical-cyclone LMI continues, it will prob- 
ably have major impacts, including increased 
threats to coastal communities that have 
historically not been susceptible to hazards 
posed by tropical cyclones. 

The observed poleward trends in the annual- 
mean LMI are consistent with, and within the 
range of, the observed expansion of the tropics 
since about 1979 (refs 3, 4). Several climate- 
related features have been used to diagnose 
this expansion, which is thought to be due to 
increased concentrations of anthropogenic 
greenhouse gases. These features include 
ozone depletion in the stratosphere, which lies 
just above the lowest portion of the atmosphere 
(the troposphere); the height of the boundary 
between the stratosphere and the troposphere 
(the tropopause); and the width of the Hadley 
circulation, the main meridional overturning 


circulation in the troposphere, which is 
characterized by rising air and thunderstorms 
near the Equator and dry, sinking air at around 
30° north and 30° south, where many of the 
world’s deserts are found. 

Kossin et al. suggest that two factors known 
to modulate tropical-cyclone development 
and intensity may have contributed to the 
observed poleward migration of annual-mean 
LMI: deep-layer vertical wind shear, that is, the 
absolute difference between wind speeds in the 
upper and lower troposphere; and potential 
intensity, a thermodynamically based theoreti- 
cal upper limit of tropical-cyclone intensity that 
depends on local sea surface temperature and 
atmospheric temperature and humidity. Many 
storms never achieve their potential intensity 
because of competing influences, such as strong 
vertical wind shear and intrusions of dry air. 
However, in principle, increased potential 
intensity and decreased vertical wind shear 
should promote more-intense storms, all other 
factors being equal. That such trends moving 
away from the Equator have been observed 
over the past 30 years (see Fig. 2 of the paper’) 


Figure 1 | Global distribution of tropical cyclones at their peak intensities. The background image is from NASA‘ Visible Earth catalogue, and the tropical- 
cyclone data come from the National Climatic Data Center’s IBTrACS archive'*"' for the period 1982-2012. Only the locations of storms that achieved an 
intensity of at least a category 1 hurricane (that is, a wind speed of at least 119 kilometres per hour) are shown. The locations represent a subset of the ‘best-track’ 
data used by Kossin and colleagues’ to construct global and regional trends in the mean latitude at which storms reached their maximum intensities. 
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BACKGROUND IMAGE: NASA GODDARD SPACE FLIGHT CENTER 


therefore seems at least qualitatively consistent 
with the observed poleward migration of 
annual-mean LMI. 

Despite the large and statistically significant 
global trends in the annual-mean latitude of 
LMI, substantial region-to-region and year-to- 
year variability is evident. For instance, the North 
Atlantic region, which has received considerable 
media attention owing to events such as hurri- 
canes Katrina and Sandy, shows almost no pole- 
ward trend on the basis of historical “best-track’ 
data over the past 30 years. Moreover, when the 
authors used a state-of-the-art data set of trop- 
ical-cyclone intensity (ADT-HURSAT; ref. 5), 
an opposite, equatorward, trend is found for the 
North Atlantic (see Table 1 of the paper’). Such 
regional differences in trends are probably due 
to climate modes that extend in time beyond the 
period for which accurate satellite-based data 
are available. 

This is one of the limitations of trend stud- 
ies based on satellite-derived estimates of 
tropical-cyclone intensity. Although the post- 
1970s geostationary satellite era is considered 
to be the most accurate part of the historical 
tropical-cyclone record, the relatively short 
observation period hampers the detection of 
trends influenced by modes of climate variabil- 
ity whose periodicity spans decades or longer, 
such as the Pacific Decadal Oscillation’. Any 
such variability implies that regions in which 
the poleward migration of annual-mean LMI 
has been more pronounced over the past 
30 years might experience less-pronounced 
trends in the coming decades, and vice versa. 
Even ona global scale, a trend of 1° of latitude 
per decade of tropical expansion (that is, a 
10° shift per century, assuming a constant rate 
of expansion) cannot be sustained without 
implausible changes to fundamental physical 
constraints on the global atmospheric circula- 
tion, such as Earth’s rotation rate. 

On year-to-year timescales, variability in 
tropical-cyclone formation and track is domi- 
nated by the phase of the El Nifio-Southern 
Oscillation (ENSO) — the episodic warming 
(El Nifo) and cooling (La Nifa) of the sur- 
face temperature of the tropical Pacific Ocean. 
El Nino often promotes an equatorward 
migration of tropical-cyclone activity, whereas 
during La Nifia a poleward displacement is 
observed’, concomitant with changes in the 
width and intensity of the Hadley circulation®. 
It is therefore plausible that any trend in ENSO 
could project onto trends in tropical-cyclone 
activity. Kossin et al. attempt to remove this 
contribution by accounting for the effect of 
ENSO on the linear trend of annual-mean LMI 
latitude and then examining the residual data. 
The poleward migration remains pronounced 
and statistically significant, suggesting that 
ENSO plays only a minor part in the long-term 
hemispheric and global trends. 

Kossin and colleagues’ findings provide 
insight into the response of global tropical- 
cyclone activity to a changing climate. However, 


several questions remain unanswered. 
For instance, will future changes in wind 
patterns cause storms to move towards or 
away from coastlines’? What are the key 
mechanisms driving the observed tropi- 
cal expansion, and how do these tie in with 
factors known to modulate tropical-cyclone 
intensity? Such questions remain the subject 
of future research. m 


Hamish Ramsay is at the ARC Centre of 
Excellence for Climate System Science and 
the School of Mathematical Sciences, Monash 
University, Victoria 3800, Australia. 

e-mail: hamish.ramsay@monash.edu 


1. Knutson, T.R. et al. Nature Geosci. 3, 157-163 (2010). 


2. Kossin, J. P., Emanuel, K. A. & Vecchi, G. A. Nature 
509, 349-352 (2014). 


SYNTHETIC BIOLOGY 


NEWS & VIEWS | RESEARCH | 


3. Hartmann, D.L. et al. in Climate Change 2013: 
The Physical Science Basis. Contribution of Working 
Group | to the Fifth Assessment Report of the 
Intergovernmental Panel on Climate Change (eds 
Stocker, T. F. et al.) Ch. 2, 226-229 (Cambridge 
Univ. Press, 2013). 

4. Seidel, D. J., Fu, Q., Randel, W. J. & Reichler, T. J. 
Nature Geosci. 1, 21-24 (2008). 

5. Kossin, J. P., Olander, T. L. & Knapp, K. R. J. Clim. 26, 
9960-9976 (2013). 

6. Grassi, B., Redaelli. G., Canziani, P. O. & Visconti, G. 
J. Clim. 25, 3282-3290 (2012). 

7. Ramsay, H. A., Camargo, S. J. & Kim, D. Clim. Dyn. 
39, 897-917 (2012). 

8. Nguyen, H., Evans, A., Lucas, C., Smith, |. & Timbal, B. 
J. Clim. 26, 3357-3376 (2013). 

9. Barnes, E. A., Polvani, L. M. & Sobel, A. H. Proc. Natl! 
Acad. Sci. USA 110, 15211-15215 (2013). 

10.www.ncde.noaa.gov/ibtracs 

11.Knapp, K. R., Kruk, M. C., Levinson, D. H., Diamond, H. J. 
& Neumann, C. J. Bull. Am. Meteor. Soc. 91, 
363-376 (2010). 


New letters for 
life’s alphabet 


The five bases found in nucleic acids define the ‘alphabet’ used to encode life on 
Earth. The construction of an organism that stably propagates an unnatural DNA 
base pair redefines this fundamental feature of life. SEE LETTER P.385 


ROSS THYER & JARED ELLEFSON 


Il known life forms store and trans- 
A information from generation to 

generation using the bases found in 
nucleic acids: adenine, cytosine, guanine, thy- 
mine and uracil. In nucleic-acid double helices, 
these form base pairs (guanine with cytosine, 
and either adenine with thymine in DNA, or 
adenine with uracil in RNA), which are mostly 
orthogonal — that is, little pairing occurs 
between other combinations of bases. How- 
ever, this ‘alphabet’ seems to be an accident 
of history rather than a functional necessity, 
given that other orthogonal base pairs have 
been synthesized and shown to be processed 
by DNA-replication enzymes in vitro’. Because 
life on Earth is biochemically uniform, the 
formal possibility of alternative alphabets 
requires strong experimental proof. In this 
issue, Malyshev et al.” (page 385) provide just 
such a proof, by conclusively showing that an 
unnatural base pair can be stably propagated 
in the bacterium Escherichia coli. 

Shortly after the discovery of DNA, it was 
proposed’ that analogues of natural bases 
could form a third functional pair, but nearly 
30 years passed before advances in organic 
synthesis and the development of methods for 
amplifying DNA gave scientists free reign to 
explore this hypothesis. In 1989, a base pair 
formed from isomers of guanine and cytosine 


© 2014 Macmillan Publishers Limited. All rights reserved 


was synthesized, and replication, transcription 
and even translation of DNA sequences incor- 
porating this base pair were demonstrated 
in vitro’*. Then in 1995 came the surprising 
finding® that hydrogen bonding between bases 
was not an absolute requirement for comple- 
mentary binding, and could be replaced by 
steric compatibility (the fitting together of 
matching molecular shapes) and hydrophobic 
interactions. This culminated in the independ- 
ent development of three highly orthogonal 
base pairs *, each capable of in vitro replica- 
tion fidelity exceeding 99%. 

Malyshev et al. now describe the develop- 
ment ofa bacterium capable of faithfully rep- 
licating a plasmid — a small, circular DNA 
molecule — containing the hydrophobic 
d5SICS:dNaM base pair (Fig. 1), thus creat- 
ing the first organism to harbour an engi- 
neered and expanded genetic alphabet. This 
feat was far from simple: the authors first had 
to find a way of getting the bacterium to take 
up unnatural nucleotides, and then to work 
within the constraints of the billion-year-old 
habits of polymerases, the enzymes that syn- 
thesize polymeric nucleic acids. 

To solve the first prob- 


NATURE.COM lem, Malyshev and col- 
For more on the leagues engineered an 
expanded genetic E. colistrain that expressed 
alphabet, visit: an algal nucleotide 
go.nature.com/gmcheg triphosphate transporter 
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Figure 1 | Prospects for organisms that propagate unnatural DNA base pairs. Malyshev et al.’ have 
generated a strain of the bacterium Escherichia coli that expresses an algal transporter protein (green), 
which imports the unnatural nucleotide triphosphates dNaM and d5SICS from the culture medium. This 
allows the bacteria to replicate a plasmid that incorporates the unnatural dNaM:d5SICS base pair (red 
dot). In turn, this might open up many further developments, including: organisms that can add new 
codons to the genetic code through customized codon-transfer-RNA interactions (XAA and YTT are 
the anticodon and codon of the tRNA and messenger RNA, respectively; X and Y are unnatural bases, A 
and T are natural bases); organisms that faithfully replicate and depend on unnatural base pairs, which 
might allow genome evolution; and non-coding RNAs (such as riboswitches, ribozymes and those in 
ribonucleoproteins) that have augmented functions. Me, methyl group; R represents the sugar and 


phosphate groups of the nucleotide triphosphates. 


(NTT) protein, which allowed direct import of 
the nucleotides d5SICS and dNaM. To ensure 
efficient replication, the authors placed the 
unnatural base pair in a region of a plasmid pre- 
dicted to be replicated solely by DNA polymerase 
I. Rather than being the workhorse of DNA rep- 
lication, this enzyme fills in gaps in DNA mole- 
cules or connects ‘Okazaki? DNA fragments, and 
has been shown’ to replicate the d5SICS:dNaM 
pair efficiently in vitro. 

After introducing a plasmid containing a 
single d5SICS:dNaM base pair into E. coli and 
supplementing the media with the two unnat- 
ural nucleotides, the researchers demonstrated 
that the unnatural base pair was retained in the 
plasmid after days in culture. They proved the 
presence of the unnatural base pair in recov- 
ered plasmids using a battery of techniques. 
Retention of the unnatural base pair after 
15 hours of cell growth and plasmid replication 
was estimated to be at least 99.4% per doubling 
of the plasmid, an error rate no worse than that 
of some viral polymerases. 

The next step will be to ensure long-term 
retention, which may require the engineering 
of a bacterium that depends on the unnatural 
base pair. It may be that the biological machin- 
ery used in Malyshev and colleagues’ E. coli 
will allow the organism to readily adopt the 
unnatural bases as part of its own genetic 
alphabet. If so, this would open up a new vista 
in which human engineering can leap chasms 


292 | NATURE | VOL 509 | 15 MAY 2014 


previously unfathomable to evolution. This 
may seem fanciful, but wholescale reassign- 
ment of the genetic code produced through 
several billion years of evolution also seemed 
unlikely, and has nonetheless recently been 
achieved”. 

Once unnatural base pairs are not just 
tolerated by an organism, but also accepted 
and used, the next crucial step will be to dem- 
onstrate that they can be transcribed into RNA 
in vivo. From there, the opportunities multi- 
ply quickly (Fig. 1) — for example, unnatural 
nucleotide pairs might augment functional 
RNA elements, such as riboswitches and 
ribozymes. The incorporation of unnatural 
nucleotides into DNA promoter sequences or 
repressor binding sites (which initiate or sub- 
due gene expression, respectively, by acting as 
binding sites for proteins), in conjunction with 
engineering of their partner proteins, might be 
used to formulate new and independent regu- 
latory architectures. 

Similar engineering feats could also provide 
unique functionality to RNA-protein com- 
plexes, for instance, by restricting the binding 
of the Cas9 enzyme (a widely used tool for gen- 
erating double-strand breaks in DNA) to sites 
containing an unnatural base pair. But perhaps 
the ultimate application of such base pairs will 
be to add novel codons — triplets of nucleo- 
tides that encode which amino acids are incor- 
porated into proteins — to the genetic code 
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through customized codon-transfer-RNA 
interactions. This would greatly expand the 
number of available codons that can be 
assigned new translational functions, such 
as encoding non-standard amino acids, and 
would prevent synthetic biologists from having 
to recode the translational functions of existing 
codons” through painstaking genome engi- 
neering. In other words, an expanded genetic 
alphabet will help build an expanded trans- 
lational alphabet. 

But why stop at six letters in DNA? The NTT 
used by Malyshev et al. may be fairly promis- 
cuous, importing both natural and unnatural 
nucleotides indiscriminately. Other groups 
have developed unnatural base pairs’* that 
could be equally acceptable substrates for the 
transporter and for the cellular replication 
machinery. If the technique for introducing 
d5SICS:dNaM into E. coli works for other 
pairs, then the DNA code could be extended 
well beyond three base pairs. This raises fun- 
damental questions about why life settled on 
only two in the first place, and whether semi- 
synthetic organisms with the capacity to store 
more information will have expanded capabili- 
ties (as we envisage above) or endure intoler- 
able fitness costs (owing to inherently lower 
fidelity of DNA replication, RNA misfolding 
or translation-error catastrophes). 

Attempts to expand the genetic alphabet 
bravely question the idea of the universal 
nature of DNA, and potentially draw criticism 
about the wisdom of tinkering with it. Such 
criticisms should be solidly countered by syn- 
thetic biologists at the outset. James Watson 
and Francis Crick’s discovery of base pair- 
ing in DNA yielded a mechanism for genet- 
ics, but now genetics has inexorably yielded 
a mechanism for greater biological diversity, 
and thus potentially for building a better 
biological future. m 


Ross Thyer and Jared Ellefson are in the 
Center for Systems and Synthetic Biology, 
University of Texas at Austin, Austin, Texas 
78712-1095, USA. 

e-mail: ross.thyer@utexas.edu 


1. Switzer, C., Moroney, S. E. & Benner, S.A. 

J. Am. Chem. Soc. 111, 8322-8323 (1989). 

2. Malyshey, D. A. et a/. Nature 509, 385-388 
(2014). 

3. Rich, A. in Horizons in Biochemistry (eds Kasha, M. 
& Pullman, B.) 103-126 (Academic, 1962). 

4. Bain, J. D., Switzer, C., Chamberlin, A. R. 

& Benner, S. A. Nature 356, 537-539 (1992). 

5. Schweitzer, B. A. & Kool, E. T. J. Am. Chem. Soc. 117, 
1863-1872 (1995). 

6. Malyshey, D. A. et al. Proc. Nat! Acad. Sci. USA 109, 
12005-12010 (2012). 

7. Yamashige, R. et al. Nucleic Acids Res. 40, 
2793-2806 (2012). 

8. Yang, Z., Chen, F., Alvarado, J. B. & Benner, S. A. 

J. Am. Chem. Soc. 133, 15105-15112 (2011). 

9. Seo, Y. J., Hwang, G. T., Ordoukhanian, P. & 
Romesberg, F. E. J. Am. Chem. Soc. 131, 
3246-3252 (2009). 

10.Lajoie, M. J. et al. Science 342, 357-360 (2013). 


This article was published online on 7 May 2014. 


ORGANIC CHEMISTRY 


Collaborative synthesis 


A chemical synthesis has led to the reassignment of the molecular structure of the 
naturally occurring compound citrinalin B. This, in turn, helps to untangle the 
biochemical origins of an intriguing family of natural products. SEE ARTICLE P.318 


JOHN L. WOOD 


ynthetic organic chemistry has the 

potential to provide convenient access 

to any complex molecule with atomic 
precision. Although this promise is still far 
from being fully realized, the progress made 
since Friedrich Wohler’s preparation of urea 
— the first synthesis of a naturally occurring 
molecule — nearly two centuries ago has been 
astonishing, and has enabled tremendous 
advances in agricultural, medicinal and mater- 
ials chemistry. In this issue, Mercado-Marin 
et al.’ report the synthesis of the natural prod- 
ucts citrinalin B and cyclopiamine B. Their 
work beautifully illustrates how natural prod- 
ucts continue to inspire research in synthesis, 
and shows that these efforts can benefit greatly 
from collaboration with experts in computa- 
tional chemistry and biosynthesis. 

This particular collaboration, led by the 
organic chemist Richmond Sarpong, was 
inspired by cyclopiamines A and B and citrin- 
alins A and B, four closely related members 
in the growing class of prenylated indoles”. 
Compounds in this class are generally derived 
from fungi and have a wide range of intrigu- 
ing biological properties, so their synthesis 


and biosynthesis have been the focus of 
considerable research. These efforts have 
established that compounds similar to cyclo- 
piamine A and citrinalin A probably derive 
from a common biosynthetic precursor that 
differentiates into enantiomers (non-super- 
imposable mirror-image isomers) of an inter- 
mediate compound, each of which serves as a 
precursor to a particular product. This notion 
is supported by the fact that large portions of 
the structures of cyclopiamine A and citrina- 
lin A are mirror images of each other (Fig. 1); 
the complete mirror-image symmetry of the 
enantiomeric precursors is destroyed in the 
course of the natural products’ biogenesis, and 
so cyclopiamine A and citrinalin A are said to 
be only pseudo-enantiomeric*”. 
Cyclopiamine B is thought to derive from 
cyclopiamine A, and Sarpong and colleagues 
speculated that citrinalin B similarly derives 
from citrinalin A. This would result in a 
pseudo-enantiomeric relationship between cit- 
rinalin B and cyclopiamine B. However, when 
they compared the reported structures of cyclo- 
piamine B and citrinalin B, confusion arose 
because the core structures are diastereomers 
(isomers that differ in the three-dimensional 
orientations of their atoms, but that are not 
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Figure 1 | A structural inconsistency. Cyclopiamines and citrinalins are members of the prenylated 
indole class of natural products. Cyclopiamine A converts to its B isomer in a reversible reaction, and is 
a pseudo-enantiomer of citrinalin A — that is, the boxed section of cyclopiamine A is the mirror image 
of the analogous section of citrinalin A. Mercado-Marin et al.' argued that citrinalin B must form from 
citrinalin A in the same reversible reaction, but the structure originally assigned to the B isomer is not 
a pseudo-enantiomer of cyclopiamine B, as would be expected (carbons 14 and 22 have the wrong 
orientations). The authors therefore proposed a revised structure for citrinalin B, which is a pseudo- 


enantiomer of cyclopiamine B. Me, methyl. 
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mirror images). Although diastereomeric rela- 
tionships are not uncommon among natural 
products, it was difficult for the researchers to 
envisage a logical mechanism whereby citrina- 
lin A could give rise to citrinalin B. Unable to 
reconcile the biosynthetic origin of citrinalin B, 
Sarpong and colleagues questioned its reported 
structure and postulated a revised structure 
through which the pseudo-enantiomeric rela- 
tionship between cyclopiamine B and citrina- 
lin B is restored (Fig. 1). 

To provide support for this hypothesis, 
Sarpong consulted with the computational 
chemist Dean Tantillo, who, along with a col- 
league, calculated thermodynamic properties 
and simulated spectra for the cyclopiamines 
and citrinalins (original and revised struc- 
tures). The simulated data for the revised struc- 
ture of citrinalin B best matched those obtained 
experimentally for the isolated natural material. 
Moreover, the calculations predicted that the 
thermodynamically preferred products are, by 
a large margin, cyclopiamine B and citrinalin B. 
Thus, if the B isomers form biosynthetically 
from the corresponding A isomers, one would 
expect to isolate little, if any, of the A isomers. 
The relatively high abundance of the A isomers 
isolated from natural sources led Sarpong and 
co-workers to speculate that cyclopiamine B 
and citrinalin B may be artefacts of the condi- 
tions used in the isolation process. 

Having obtained computational support for 
the hypothesized structural revision, Sarpong 
and colleagues devised a synthesis of ‘revised’ 
citrinalin B to provide unambiguous confir- 
mation of its structure. They used the amino 
acid p-proline as the point of departure ina 
16-step sequence that delivered an indole 
intermediate (Fig. 2). In a key step of the syn- 
thesis, the researchers then had to convert this 
intermediate to the corresponding oxindole, a 
compound that contains a synthetically chal- 
lenging carbon atom carrying four different 
groups, known as a stereogenic quaternary 
carbon. To ensure the conversion, the authors 
postulated that the indole would need first 
to undergo a diastereoface-selective epoxi- 
dation (selective introduction of an oxygen 
to the ‘back face of the molecule; the result- 
ing epoxide group is shown in blue in Fig. 2). 
This would produce an intermediate that is 
poised for regioselective ring opening (selec- 
tive breaking of the carbon-oxygen bond in 
the epoxide that is distal to the adjacent indole 
nitrogen) and subsequent migration of a car- 
bon-carbon bond, controlled so as to set up 
a specific three-dimensional arrangement of 
atoms at the stereogenic carbon. 

The indole-to-oxindole transformation was 
fruitful, but its development required consid- 
erable experimentation, and success relied on 
several key structural features present in the 
indole substrate. Specifically, a primary amine 
group (NH,) was crucial in guiding the facial 
selectivity observed during the initial epoxida- 
tion step, and a chromanone unit (highlighted 
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Oxidation 


Oxindole intermediate 


Bond migration and ring contraction 


Figure 2 | Key step in the synthesis of ent-citrinalin B. In Mercado-Marin and colleagues’ synthesis’ of 
the revised structure of ent-citrinalin B, a key step was to convert an indole intermediate to an oxindole 
intermediate, ensuring the formation of the three-dimensional arrangement — the stereochemistry — 
of groups around the stereogenic centre, which is highlighted in pink in the oxindole. (A chromanone 
group in the indole is shown in red.) The authors used an oxidation reaction to attach an oxygen atom 

to the ‘back face of the indole, forming an epoxide ring (blue). A regioselective ring-opening reaction 
then occurred, facilitated by a pair of electrons (dots) from an oxygen atom in the chromanone; curly 
arrows indicate electron movement. This reaction set up the desired stereochemistry around the carbon 
atom (highlighted in green) that goes on to form the stereogenic centre of the oxindole. Subsequent bond 
migration and ring contraction led to the formation of the target oxindole. Square brackets indicate 


transiently formed reaction intermediates. 


in red in Fig. 2) facilitated regioselective 
opening of the presumed intermediate epox- 
ide’. In the latter event, an electron pair on the 
chromanone might help to break the distal 
carbon-oxygen epoxide bond, whereas 
undesirable competition by the indole nitro- 
gens lone pair is mitigated by the chromanone'’s 
carbonyl group (C=O), which withdraws 
electron density from the nitrogen. 


NEUROBIOLOGY 


Sarpong and colleagues readily advanced 
the resulting oxindole intermediate to make 
the enantiomer of citrinalin B (ent-citrin- 
alin B), which, in turn, provided access to 
cyclopiamine B through a rearrangement 
of the chromanone group. Importantly, the 
spectral data for these compounds were iden- 
tical to those of naturally occurring samples 
(with the exception of the optical rotation of 


lo care or not to care 


The behaviour of adult mice towards pups varies depending on gender and sexual 
experience. The activity ofa population of neurons in the hypothalamus of the 
brain has now been found to regulate these differing responses. SEE ARTICLE P.325 


IVAN RODRIGUEZ 


he survival of vertebrates depends 

on their ability to display appropriate 
behaviour towards nearby individu- 

als — for example, towards predators, prey 
or members of their own species. In some 
cases, escaping death depends on being able 
to trigger stereotyped responses in others. The 
immediate survival of newborn mammals, 
for instance, relies on their ability to induce 
a caring social response in their parents. On 
page 325 of this issue, Wu et al.' identify a 
subset of neurons in the hypothalamus of the 
mouse brain that is essential for parental care. 
For more than 100 million years, milk 
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has been the primary source of nutrition for 
newborn mammals, which means that a daily 
physical link between mother and young is 
mandatory for newborn survival’. Thus, by 
not only gestating the fetus but also suckling 
the newborn offspring, mammalian mothers 
make a much larger investment in their prog- 
eny than mammalian fathers. The contribu- 
tion of males to parenting is highly variable, 
and depends on the species and experience of 
the animal. In the presence of a pup, a male 
mouse without sexual experience might show 
indifference, but the norm is a physical attack 
on the younger animal. This all changes after 
mating: the experienced male’s behaviour 
switches from aggressive to parental. 
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ent-citrinalin B, the sign of which was opposite 
to that for natural citrinalin B, as expected for 
the enantiomer), thus confirming the postu- 
lated structural revision. 

Although most synthetic chemists would 
have stopped at this stage, Sarpong chose to 
take things further. In collaboration with the 
group of Roberto Berlinck, the chemist who 
originally isolated the citrinalins, Sarpong 
initiated a series of studies to determine the 
biogenetic precursors of citrinalins and cyclo- 
piamines. This proved to be highly successful, 
not only establishing the major sources of car- 
bon atoms, but also resulting in the isolation 
of citrinalin C — the structure of which sup- 
ports not only the hypothesis of enantiomeric 
precursors for citrinalins and cyclopiamines, 
but also the postulated biosynthetic link 
between these compounds that initially led the 
authors to propose the structural revision of 
citrinalin B. = 
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What a mouse perceives when it encounters 
another mouse is largely processed by a 
specialized sensor in its nose — the vomero- 
nasal organ. This structure is linked to innate 
responses, including fight, flight and sex- 
specific behaviours, a property that led Wu 
and colleagues to propose that the vomero- 
nasal organ might be involved in regulat- 
ing the variable predisposition of male mice 
to parenting. To test this theory, the authors 
impaired vomeronasal signalling to the brain 
in virgin male mice, by genetically deleting an 
ion-channel protein (TRPC2) that is expressed 
only in the vomeronasal organ. The males 
behaved much less aggressively towards pups 
than their wild-type counterparts, suggesting 
that signalling from the vomeronasal organ 
promotes aggression, directly or indirectly, in 
virgin males. 

Wu and co-workers observed that a popu- 
lation of neurons in the medial preoptic area 
(MPOA) of the hypothalamus was activated 
in males and females that exhibited parental 
behaviour towards pups. But the neurons 
were less active in virgin males in the presence 
of newborns, suggesting that this neuronal 
population might be involved in promoting 


caring behaviours and might be modulated by 
vomeronasal signalling. This could have been 
the end of an already interesting story had the 
authors not found that a large portion of the 
neurons expressed the neuropeptide galanin. 
This finding enabled them to develop a tool 
with which to evaluate the role of these neu- 
rons in the regulation of parenting. 

The experiments that followed were sim- 
ple, at least conceptually, and involved two 
approaches. To disrupt the function of the 
galanin-expressing neurons in the MPOA, 
Wuet al. genetically engineered these cells to 
die. In a parallel approach, they engineered 
the neurons to express the protein Channel- 
rhodopsin-2, which allowed their specific acti- 
vation in response to light. 

The results of these two complementary 
experiments were striking. First, depletion of 
MPOA galanin-expressing neurons in virgin 
females caused them to behave aggressively 
towards pups. Similarly, sexually experienced 
males and females showed deficits in parenting 
behaviour after galanin-expressing neurons 
were depleted. More remarkably, light stimu- 
lation of the neurons in virgin males inhibited 
their attacks on newborns, and even induced 
pup grooming (Fig. 1). These impressive data 
point to a crucial role for galanin-expressing 
neurons in mediating parental behaviour 
towards pups, thus linking these neurons to 
the survival chances of young mice. 

Several neural-circuit models can be built 
on the authors’ findings. To take the sim- 
plest, galanin-expressing neurons may act as 
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a regulatory node that puts the brakes on a 
default aggressive state promoted by signalling 
from the vomeronasal organ. Alternatively, 
these neurons might simply promote parent- 
ing behaviour. 

But naturally, questions arise. For example, 
given the known roles of various neuropep- 
tides in modulating behaviour’, could galanin 
itself be more than just a handy genetic marker 
for these neurons? Supporting this hypothesis, 
injection of galanin into the MPOA of Syrian 
hamsters leads to changes in scent-marking 
behaviour’. 

It seems that the galanin-expressing 
circuitry affects more than just parenting 
behaviour in mice. Wu and colleagues found 
that some social behaviours unrelated to 
pups, such as inter-male aggression, were also 
affected by activation of the neurons. This 
observation makes the story more complex, but 
it also opens up the possibility of dissecting the 
neural circuits that drive distinct behaviours. 

The switch between infanticide and parent- 
ing behaviour is not specific to rodents — it is 
pervasive in the animal kingdom, from birds to 
marine mammals, in both males and females. 
The best-known example is male lions, which 
kill young cubs when entering a new social 
group (in addition to evicting adult males 
and sub-adults)°. What is the advantage of 
this infanticide? One obvious explanation is 
that, for the male to maximize the chances of 
producing his own offspring, it is beneficial to 
ensure that female lactation ends and ovula- 
tion restarts as soon as possible. 
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Figure 1 | Modulating parenting behaviours in the presence of pups. Virgin female mice and sexually 
experienced males and females show parenting behaviour towards young pups, but virgin males act 
aggressively towards young. Wu et al.' investigated the basis for these varying responses in the four groups 
of mice. They report that the responses are modulated by a group of neurons in the medial preoptic area 
(MPOA) of the brain that express the protein galanin (inactive galanin-expressing neurons are pink, 
active ones are red, and neurons that do not express galanin are grey). These neurons are present in all 
four groups, but in the presence of pups, fewer are active in virgin males. Depletion of galanin-expressing 
neurons in virgin females or in sexually experienced mice of either sex causes impaired parental 
responses in the presence of pups and, in some cases, aggressiveness. Conversely, activation of 
galanin-expressing neurons in virgin males suppresses aggression and induces pup grooming. 
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50 Years Ago 


“The Mathematical Association 
Annual Conference 1964 — 
Administrative problems of 

course loomed very large ... It 

was easy to fail to appreciate the 
possible influence of the liberal and 
cultural qualities of mathematics 

on sixth formers, who were given 
opportunities to read round their 
subject. The learning of mathematics 
was organic and would grow 
wherever it was given room; like 
angling, it could “never be fully 
learnt” ... It was a sombre fact 

that nearly 25 per cent of persons 
entering training colleges in 1962 did 
not possess a pass at the Ordinary 
Level of the General Certificate 

of Education in mathematics ... 

The discussion from the floor 
represented the confusion which 
exists in most teachers’ minds as 

to how one can reconcile teaching 
logic with geometry, when the only 
logical way seemed to be to start at 
the middle, work on establishing the 
standard results, and then work back 
and establish the premises ... the 
discussion had shown that Euclid, 
instead of being found at fault, had 
proved himself to be too good for the 
age group for which we tend to use 
him. Unfortunately we had found 
nothing to replace him for this age 
group. 

From Nature 16 May 1964 


100 Years Ago 


The annual report of the Hampstead 
Scientific Society ... contains ... 

a summary of the meteorological 
statistics for the Hampstead 
Observatory for 1913 ... For the 
first time, average meteorological 
data are included in the report ... 
From these preliminary averages 

it would appear that Hampstead 

is the coldest, rainiest, snowiest, 
and frostiest, as well as almost the 
sunniest and foggiest of the stations 
in the neighbourhood of London. 
From Nature 14 May 1914 
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It is unclear whether infanticide provides 
such benefits to virgin male mice, given the 
fast weaning times and frequent oestrous 
cycles that are characteristic of this species. 
But aggressive behaviours spontaneously re- 
emerge in male mice 50 days after mating’: 
exactly the length of time it takes for their 
progeny to be born and weaned. Furthermore, 
exposure of a pregnant female to the scent of 
an unfamiliar male mouse is sufficient to cause 
termination of pregnancy’. As such, mice may 
be the champions of infanticide. 

A picture is emerging in which regulatory 


SENSORY BIOLOGY 


nodes of social interactions switch on specific 
neural circuits at the expense of others*”. These 
circuits underlie stereotyped behaviours, and 
coexist in both males and females, whether 
they are sexually experienced or not. But only 
under specific conditions are they activated. 
This is a remarkable example of the modularity 
and versatility of mammalian brains. = 
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Radio waves zap the 
biomagnetic compass 


Weak radio waves in the medium-wave band are sufficient to disrupt geomagnetic 
orientation in migratory birds, according to a particularly well-controlled study. 
But the underlying biophysics remains a puzzle. SEE LETTER P.353 


JOSEPH L. KIRSCHVINK 


agnetobiology has largely been 
Meee as a stamping ground for 

charlatans since the followers of 
physician Franz Anton Mesmer failed to cure 
patients using a ‘magnetized’ tree in the eight- 
eenth century. Numerous discoveries have 
begun to change that perspective, although 
the road has been rocky. For example, early 
studies suggesting that migrating animals use 
geomagnetic cues for navigation were plagued 
by variability, but it is now clear that many 
microorganisms and animals use a magnetic 
compass for part of their orientation’. 

On the fringe of this fringe field were claims 
that radio-frequency radiation could have bio- 
logical effects at levels too weak to act through 
the understood mechanisms of tissue heat- 
ing or shock, but the experiments usually 
lacked proper controls and blinding tech- 
niques’ *. Now, however, on page 353 of this 
issue, Engels et al.’ demonstrate convincingly 
that migrating European robins stop using 
their magnetic compasses in the presence 
of extraordinarily weak, radio-frequency 
electromagnetic ‘noise’. 

Using rigorous, double-blinded experi- 
ments, the authors found that birds housed in 
huts screened from background electromag- 
netic noise were able to use their magnetic 
compass to orient themselves appropriately, 
but that their orientation was disrupted fol- 
lowing the introduction of electromagnetic 
noise ranging from 20 kilohertz to 5 mega- 
hertz, at intensities similar to that measured 
for background anthropogenic noise in the 
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environment. To put it into perspective, this 
is in the medium-wave band used for AM 
radio transmissions (not, for example, mobile 
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phones), and the strength is about equivalent 
to what a bird in flight might experience 5 kilo- 
metres away from a 50-kilowatt AM radio 
station. 

Two results flag this study as particularly 
noteworthy, and puzzling. First, the levels of 
radio-frequency radiation that affected the 
birds orientation are substantially below any- 
thing previously thought to be biophysically 
plausible, and far below levels recognized as 
affecting human health. Second, the authors 
detect no trace of a sharply enhanced effect 
at the Larmor frequency (the natural period 
at which single electrons wobble around the 
geomagnetic-field direction), which flatly con- 
tradicts experiments on the same species per- 
formed using a similar protocol’. This failure 


Figure 1 | Biological ‘magnetomonsters. Several fossil and extant organisms contain highly magnetic 
structures. Examples include: a, Magnetobacter bavaricum, a magnetotactic bacterium with nearly 

100 times more magnetite in its cells than more typical types; b, Cryptochiton stelleri, a mollusc whose 
magnetite-capped radular teeth will stick strongly to a hand magnet; c, a spearhead-shaped magnetite 
particle (false-coloured red), prismatic magnetite rods (purple) and typical magnetite-containing 
bacterial organelles (magnetosomes; green); d, a bundle of magnetite rods forming ‘wires. The structures 
shown in c and d were extracted from fossilized clay sediments in New Jersey dating to approximately 

56 million years ago". The origins of the spearhead- and rod-shaped objects are not known, but their size 
and morphology suggest that they might have belonged to more-complex organisms. Cellular structures 
containing enough electrically conducting magnetite could be sensitive to radio-frequency radiation at 
levels shown by Engels et al.° to disrupt birds’ geomagnetic orientation. 
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to replicate that effect perhaps underscores 
previously suggested” flaws in the blinding of 
earlier studies. 

So what might be going on in these birds? 
Several other external stimuli that stop ani- 
mals from responding to geomagnetic cues 
have been identified. Early studies of animal 
navigation noted that cues from the Sun or 
stars would take precedence over magnetic 
cues, leading to the idea that magnetism is the 
compass of last resort. It was then noticed that 
robins would ignore the magnetic field when 
the background intensity was shifted 20-30% 
outside the normal value’, and that pigeons 
raced poorly during geomagnetic storms. 
From an evolutionary perspective, ignoring 
geomagnetic cues at such times makes sense, 
because anomalies in the background field are 
often associated with iron deposits or lightning 
strikes. Some animals also stop using their 
magnetic compass in the presence of red-only 
light, but such light is present only at sunrise 
and sunset, when the Sun compass is most 
reliable’. 

Hence, radio-frequency noise might be 
just another cue that tells migrating animals 
to ignore their magnetic sense, but the puzzle 
is why this might have evolved. Surprisingly, 
there is a natural source of the radio-frequency 
electromagnetic noise identified as disruptive 
by Engels and colleagues — that produced by 
solar storms. Coronal mass ejection (CME) 
events from the Sun slam plasma into Earth’s 
magnetosphere every now and then, caus- 
ing it to ‘sing’ at frequencies from as low as 
around 20 kHz up to the MHz range’, some 
of which even leaks through Earth's normally 
radio-opaque ionosphere; the lower end of this 
range is remarkably close to that identified by 
the authors. These CME events generate the 
beautiful polar auroras, disrupt our use of the 
medium-wave radio band, and sometimes 
perturb the background geomagnetic field 
at Earth’s surface enough to disturb animal 
navigation. 

All known sensory systems in animals are 
based on cells specialized to convert the stim- 
ulus of interest into a coded stream of action 
potentials that are sent to the brain®. If the 
effects of radio-frequency radiation are real, 
such cells must exist, but the mystery is in the 
biophysics. The lack of an enhanced effect at 
the Larmor frequency, and the low levels of 
radiation concerned, make it unlikely that a 
previously proposed mechanism’ for radio- 
sensing, based on light activation of a cellular 
protein called cryptochrome, is involved. But 
some magnetic effects on animals (such as that 
of a short, sharp magnetic pulse’) function 
through biological magnetite (Fe,O,,) in tissue 
— might this also be the radio-wave detector? 

If it is, how could such a detection mecha- 
nism have arisen? Early animals that had a 
simple compass patterned along the lines of 
magnetotactic bacteria would have needed to 
survive geomagnetic excursions or reversals 


— periods in which Earth’s magnetic field 
weakened — and natural selection would have 
favoured individuals with higher cellular vol- 
umes of magnetite*”. When the field recovered, 
animals would have been left with cells that 
have surprisingly large magnetic moments” 
(Fig. 1). Such cells might then have evolved to 
serve other functions, such as intensity-based 
magnetic navigation systems, increasing 
the amount of magnetite further. With large 
enough volumes of metallically conductive 
magnetite in these cells, direct detection of the 
small electric and magnetic vectors of radio- 
frequency radiation might have emerged, as 
Engels and colleagues suggest. 

Do the authors’ findings have implications 
for humans? It seems that geomagnetic sen- 
sitivity dates back to an early ancestor of ani- 
mals, and it is clearly present in many extant 
mammalian species. Human tissues also 
contain biological magnetite’. Many people 
claim to be bothered by radio transmissions, 
and some have even moved to live in radio- 
frequency ‘quiet zones’ around radio tele- 
scopes. Modern-day charlatans will undoubt- 
edly seize on this study as an argument for 
banning the use of mobile phones, despite the 
different frequency bands involved. However, 
if the effect reported by the authors stands 
the acid test of reproducibility, we might con- 
sider gradually abandoning our use of this 
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portion of the electromagnetic spectrum and 
implementing engineering approaches to 
minimize incidental low-frequency noise, to 
help migratory birds find their way. = 
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Geology and climate 
drive diversification 


Data from the Galapagos Islands exemplify how geology and climate can interact 
to cause episodes of isolation and fusion of the biota across a landscape. Different 
scales of such cycles dictate varying mechanisms of species generation. 


ROSEMARY G. GILLESPIE 
& GEORGE K. RODERICK 


riting in the Journal of Biogeography, 

Ali and Aitchison’ examine geolo- 

gical and climatic events over the 
past 700,000 years, namely island ontogeny and 
shifting sea levels, and their effects on biodiver- 
sity in the Galapagos Islands. The authors pro- 
pose a process that can be considered a general 
evolutionary mechanism: that the dynamics 
of isolation caused by geological and clima- 
tological processes plays a fundamental part 
in shaping diversity. Whether these processes 
promote or constrain species diversifica- 
tion, however, depends on the spatial (global, 
regional or local) and temporal (multimillion, 
multimillennial or multidecadal) scales and 
periodicity of isolation and coalescence. 
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Geological events have long been known to 
mould and shape biodiversity. A breakthrough 
in understanding the underlying mechanisms 
came with the recognition that ancient split- 
ting of landmasses resulted in shared diversity. 
The concept of vicariance biogeography — the 
separation of a group of organisms by a geo- 
graphical barrier — provided the means for 
rigorous hypothesis testing in a hitherto largely 
descriptive field. This established that vicari- 
ance resulting directly from geological events 
can cause diversification, such that geological 
history will be clearly reflected in the result- 
ing biotic assemblages. The isolation created 
by ancient geological events is fundamental. 
Yet, what is given is frequently taken away — 
separate land masses can become connected 
and biotic assemblages reunited to various 
degrees. For example, the Great American 
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Figure 1 | Isolation across scales of space and time. a, Ona multimillion-year timescale, distinct 
assemblages of biota form in isolation on different continents, and joining of the continents may lead 
to biotic exchange or species displacement. b, On a millennial timescale, different species may evolve 
in isolation on different islands; connections between these islands can then result in a richer biological 
assemblage as newly formed species come together. c, On a decadal timescale, populations may be 
isolated by recurrent events, such as volcanic eruptions; subsequent reconnection of these populations 
may result in new genetic combinations, a phenomenon that may also occur with invasive species. 


Interchange associated with the formation of 
the Isthmus of Panama some 3 million years 
ago allowed the exchange of biotas of North 
and South America, each of which had evolved 
in isolation. 

There is increasing evidence that geologi- 
cal and climatological events over shorter 
timescales (10,000 to 1 million years) and over 
smaller spatial scales (regional or local rather 
than global) can also influence the diversifica- 
tion process. A prime example of insights to 
emerge from such situations is provided by Ali 
and Aitchison in their model, which integrates 
the dynamics of geological and climatological 
events that repeatedly connected and discon- 
nected islands of the Galapagos over the past 
700,000 years. During this time period, these 
intermittent connections allowed otherwise 
landlocked vertebrates to disperse to other 
islands and to reconnect with populations that 
had previously been separated. More broadly, 
Ali and Aitchison use detailed palaeogeo- 
graphical reconstructions to provide a series 
of explicit hypotheses about past population 
structuring and species formation that can be 
tested with molecular and geospatial data from 
extant species. 

Although repeated separation and reunit- 
ing of biotas over a multimillennial time frame 
can obscure older geological and climatologi- 
cal events, the dynamics of geology and climate 
can be powerful forces in generating biodiver- 
sity. For example, work focused on time frames 
of 10,000 to 100,000 years ago has shown that 
climatological events can act asa ‘species pump, 
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in which periods of warming or drying serve to 
alternately isolate and reunite biotas. This phe- 
nomenon is well illustrated by Indonesian ants 
in the Sundaland rainforest, in which diversifi- 
cation has been attributed to repeated episodes 
of separation and connection of populations 
during the Plio- Pleistocene (around 5 million 
to 12,000 years ago), associated with fluctuating 
sea levels and climate’. Similar episodic isola- 
tion associated with climatic shifts has been 
inferred for the diversification of Amazonian 
vertebrates in the Pleistocene (around 2 mil- 
lion to 12,000 years ago)’. Even in the oceans, 
which have the potential for extensive mixing, 
opposing processes of isolation and exchange 
seem to have been responsible for much 
diversification’. 

On more recent (decadal) timescales and in 
localized areas, the same mechanism of isola- 
tion and mixing resulting from the combined 
effects of geological and climatological events 
can influence patterns of diversity prior to the 
formation of species. For example, population 
mixing and hybridization — resulting from 
previously separated populations coming back 
into contact — have had a key role in generat- 
ing adaptive variation and functional novelty 
in populations of cichlid fish in African lakes’. 
Indeed, the repeated isolation and subsequent 
mixing of populations in new combinations 
may serve as an ‘evolutionary crucible’ to facili- 
tate and potentially accelerate diversification’. 
Furthermore, the negative consequences of 
founder effects (the reduced genetic variation 
that occurs when a population is established by 
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a small number of individuals) may be offset if 
different colonization events result in multiple 
genotypes within the introduced population. 
This process highlights the potential role of 
mixing among successively colonizing popu- 
lations in providing the genetic variation to 
allow adaptive evolution’. 

Thus, geological and climatological dynam- 
ics over time and space shape the patterns of 
biodiversity that we can observe and measure 
today. Specifically, the periodicity of isolation 
and connection dictates evolutionary out- 
comes, and understanding of this dynamic 
has become the focus of genomic approaches’. 
When species evolve in isolation over long 
(multimillion-year) periods, reconnections 
can result in exchange or displacement of 
entire assemblages (Fig. 1a). When species are 
isolated for long enough to allow speciation 
(multimillennia), subsequent connections may 
unite newly formed species, thereby generating 
diversity (Fig. 1b). And when populations are 
isolated locally and over decadal timescales, 
thus prompting the development of disparate 
gene pools, subsequent reconnection and mix- 
ture can create new genetic combinations upon 
which selection can act (Fig. 1c). 

Moving to the present and future, it is clear 
that, as organisms shift their distributions in 
response to climate change and as globaliza- 
tion increasingly homogenizes previously 
isolated biotas, understanding the role of 
historic isolation and recent connection in 
biodiversity dynamics is crucial. Indeed, the 
mixing of previously isolated populations is 
characteristic of many invasive species, sug- 
gesting a role for novel genetic combinations 
in their successful establishment in new envi- 
ronments. An increased knowledge of past 
events of isolation and fusion of biotas, such 
as that provided by Ali and Aitchison, will 
better equip us to predict future dynamics of 
biodiversity. = 
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Recent advances in homogeneous 


nickel catalysis 


Sarah Z. Tasker'*, Eric A. Standley'* & Timothy F. Jamison! 


Tremendous advances have been made in nickel catalysis over the past decade. Several key properties of nickel, such as 
facile oxidative addition and ready access to multiple oxidation states, have allowed the development of a broad range of 
innovative reactions. In recent years, these properties have been increasingly understood and used to perform transfor- 
mations long considered exceptionally challenging. Here we discuss some of the most recent and significant developments 
in homogeneous nickel catalysis, with an emphasis on both synthetic outcome and mechanism. 


younger sibling of palladium in the field of transition metal cata- 
lysis. After all, the use of palladium-catalysed cross-coupling has 
skyrocketed over the past half-century: it was honoured with the 2010 Nobel 
Prize in Chemistry, and is ubiquitous in applications that range from com- 
plex natural product synthesis to drug discovery to manufacturing. Nickel 
lies just above palladium in the periodic table, and as a group 10 metal, it 
can readily perform many of the same elementary reactions as palladium or 
platinum. Because of these commonalities, nickel is often viewed solely as 
a low-cost replacement catalyst for cross-coupling reactions. However, this 
common misconception is clearly refuted by the numerous and diverse nickel- 
catalysed reactions reported in the literature. Indeed, homogeneous nickel 
catalysis is currently experiencing a period of intensified interest. In this 
Review, we aim to use recent developments in organonickel chemistry to 
illustrate how the intrinsic properties of nickel have enabled its use as an 
effective catalyst for many intriguing, valuable and difficult transformations. 
Historically, the use of nickel in organometallic reactions pre-dates many 
other examples of transition metal catalysis'*. Nickel was isolated in 1751, 
its name is derived from the German Kupfernickel, the name given to a 
nickel ore originally believed by miners to contain copper, but which did not 
yield copper on extraction (hence use of Nickel, a mischievous demon). In 
the 1890s, Mond observed one of the unusual reactivity patterns of nickel: 
elemental nickel and CO reacted at room temperature to form Ni(CO),4, an 
extremely toxic, low-boiling liquid, which could be used to purify the metal. 
Shortly thereafter, Sabatier performed the first hydrogenation of ethylene 
using nickel, for which he was awarded the 1912 Nobel Prize in Chemistry. 
But undoubtedly, one of the most prominent and prolific early contributors 
to organonickel chemistry was Wilke’. Wilke made seminal contributions 
to the structure and reactivity of nickel complexes, including the synthesis 
of Ni(cod), (a ubiquitous source of complexed zero-valent nickel, Ni(0); cod, 
1,5-cyclooctadiene) and investigation of olefin oligomerization reactions. 
Beginning in the 1970s, nickel found extensive use both for cross-coupling 
and reactions of alkenes and alkynes, such as nucleophilic allylation, oligo- 
merization, cycloisomerization and reductive coupling. Many excellent books 
and reviews of organonickel chemistry in general’, as well as of specific trans- 
formations (for example, reductive coupling’ and cross-coupling’), already 
exist. Consequently, we have chosen to focus on key advances in nickel- 
catalysed reactions since 2005 and to highlight how researchers can take 
advantage of nickel’s characteristic properties and reactivity to perform inno- 
vative and useful transformations. Whereas applications of nickel chemistry 
span materials science, polymer synthesis and biocatalysis, this Review 


a o the uninitiated, nickel might seem like just the impoverished 


encompasses only homogeneous nickel catalysis relevant to small molecule 
synthesis. Additionally, owing to the short nature of this Review and the 
breadth of nickel chemistry, we are unable to include discussions of all the 
exemplary methods developed in the past decade. However, we hope that 
the selected reactions and mechanistic studies presented herein spark fur- 
ther investigation into the full range of nickel-catalysed reactions. 


Mechanism and elementary steps 


Before discussing each class of transformation, a survey of nickel’s character- 
istic modes of reactivity, particularly in regard to some of the elementary 
steps of transition metal catalysis (Fig. 1) is needed. Nickel is a relatively 
electropositive late transition metal. Therefore, oxidative addition®, which 
results in loss of electron density around nickel, tends to occur quite readily 
(though, conversely, reductive elimination is correspondingly more dif- 
ficult)°. This facile oxidative addition allows for the use of cross-coupling 
electrophiles that would be considerably less reactive under palladium cata- 
lysis, such as phenol derivatives’ ’, aromatic nitriles’? or even aryl fluorides". 

Nickel also has a number of readily available oxidation states commonly 
invoked in catalysis. The majority of palladium-catalysed reactions are based 
ona Pd(0)/Pd(m) catalytic cycle, and most often proceed through polar (that 
is, non-radical) mechanisms. Likewise, Ni(0)/Ni(m) catalytic cycles are wide- 
spread, but the easy accessibility of Ni(1) and Ni(m) oxidation states allows 
different modes of reactivity and radical mechanisms. As a result, many 
transformations are based on Ni(1)/Ni(1), Ni(0)/Ni(u)/Ni()), or even cycles 
in which nickel remains in the Ni(1) state for the entire catalytic cycle’. 

Many nickel complexes have long been known as privileged catalysts for 
reactions of alkenes and alkynes, such as oligomerization’ or reductive coupl- 
ing. Nickel readily donates d-electrons to m-acceptors, so olefin bonding is 
generally strong'*. B-Hydride elimination tends to be slower with nickel 
relative to palladium; specifically, the energy barrier to Ni-C bond rotation 
prior to B-hydride elimination is often significantly higher for nickel than 
comparable palladium species’’. 

Finally, there are a few more obvious differences between nickel and its 
group 10 counterparts. Practically speaking, the cost of nickel in its elemental 
form is roughly 2,000 times lower than palladium and 10,000 times lower 
than platinum ona mole-for-mole basis, though the price of commonly used 
nickel sources for catalysis can be less favourable. Asa first-row transition 
metal, nickel has a small atomic radius, and Ni-ligand bond lengths are 
often relatively short'®. Researchers have been taking advantage of all of the 
above features to develop new reactions, which are demonstrated in the 
specific examples that follow. 
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Figure 1 | Nickel fundamentals. a, Comparison of basic characteristics of 
nickel and palladium, including accessible oxidation states (top row, states in 
bold are more commonly involved in catalysis) and trends in reactivity. 

b, Prototypical examples of elementary organometallic reaction steps, 
highlighting changes in oxidation state at nickel. Additional ligands bound to 
nickel not involved in each transformation are omitted for clarity. Ar, aryl; 
M, metal; Me, methyl. 


Cross-coupling 

Building on discoveries and developments from the 1970s, nickel has proved 
to be an extremely effective catalyst for cross-coupling. Cross-coupling reac- 
tions are transition-metal-catalysed reactions originally developed as a means 
to synthesize biaryls from arylmetal species and aryl halides or pseudohalides. 
Alongside palladium, nickel has been used extensively for Suzuki-Miyaura 
(organoboron reagents) and Negishi (organozinc reagents) cross-coupling 
reactions, in particular’’. In more recent years, the scope of cross-coupling 
reactions has expanded far beyond simple biaryl synthesis to include many 
other types of coupling partner. Nickel catalysis has been at the forefront 
of this expansion, as will be demonstrated in the following sections. 


Cross-coupling of aryl halides 

The ability of nickel to oxidatively insert into carbon-heteroatom bonds with 
ease is of particular advantage in the Suzuki-Miyaura reaction of hetero- 
aromatic boronic acids, which readily undergo protodeboronation under 
the basic (and usually hydrous) reaction conditions, particularly at elevated 
reaction temperatures. One useful contribution comes from Hartwig and Ge, 
who have developed a method for the synthesis of heterobiaryls that takes 
advantage of this rapid oxidative addition by combining a lowered reaction 
temperature with readily activated precatalyst 3 to afford the traditionally 
difficult-to-access heterobiaryls'* (Fig. 2a). Additionally, precatalyst 3 possesses 
adequate stability to be handled in air, and only 0.5 mol% of the pre- 
catalyst is required, making this a highly efficient reaction. This method 
is applicable to both heteroaryl chlorides and bromides and gives high yields 
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Figure 2 | Recent nickel-catalysed Suzuki-Miyaura arylations. 

a, Cross-coupling of heteroaryl boronic acids (1) with heteroaryl halides (2) to 
form heterobiaryls (4) is a long-standing challenge. This method, employing an 
air-stable nickel catalyst precursor 3, provides the desired heterobiaryls in 
excellent yields. b, Cross-couplings developed for small-scale use are often 
carried out in solvents poorly suited to industrial or large-scale use. As such, 
the adaptation of the Suzuki-Miyaura cross-coupling to form (hetero)biaryls 
such as 7 using ‘green’ solvents while still obtaining the products in high yield 
is a valuable development. dppf, 1,1'-bis(diphenylphosphino)ferrocene; 
cinnamyl, trans-CsH;CHCHCH)-; (Het)Ar, heteroaryl; THF, 
tetrahydrofuran; Cy, cyclohexyl; Ms, methanesulphonate; Piv, pivaloyl. 


of products across an impressive range of substrates. A related method for 
the synthesis of heterobiaryls was disclosed by Garg and co-workers in 
2013"° (Fig. 2b). This method focused on further improving the efficiency 
of Suzuki-Miyaura couplings as well as employing ‘green’ solvents such as 
2-MeTHE and t-amyl alcohol. 


Cross-coupling of phenol derivatives 

Although using aryl halides as cross-coupling partners is the de facto stand- 
ard, accessing the desired halide coupling partner is not always trivial and 
can sometimes be extremely challenging. One well-established solution 
is the use of aryl triflates (Fig. 3a). They typically possess extremely high reac- 
tivity in cross-coupling reactions and, because they are obtained by reaction 
ofa phenol with triflic anhydride, are derived from a pool of materials entirely 
separate from ary] halides. For these two reasons, triflates are valuable coup- 
ling partners. However, triflates are prone to hydrolysis, especially under basic 
conditions, making their use for certain reactions challenging. Tosylates 
and mesylates (Fig. 3b), close relatives of triflates, have found use in cross- 
coupling reactions, as their greater stability reduces or eliminates the problem 
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Figure 3 | Halogen alternatives used in cross-coupling reactions. a, Aryl 
triflates have long been used as replacements for halogens in cross-coupling 
reactions. Aryl nonaflates were developed later to address some of the issues 
encountered when working with aryl triflates, but their use is less widespread. 
b, The use of aryl mesylates, tosylates and sulphamates presents many 
advantages over triflates and related fluorinated sulphonates owing to their 
increased stability. c, Like sulphonate derivatives, the use of carboxylic esters, 
carbonates, carbamates, ethers and silyl ethers can be advantageous in many 
situations. p-Tol, para-tolyl (4-methylphenyl); TMS, trimethylsilyl. 
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of hydrolysis. However, their reactivity towards oxidative addition by 
metals is also considerably reduced, often leading to the need for harsher 
reaction conditions. For these reasons, catalyst systems capable of activ- 
ating the C-O bonds of functional groups other than triflates—such as 
ethers, esters, carbamates and carbonates (Fig. 3c)—are highly desired. Reac- 
tions based on such catalyst systems would represent an ideal combination 
of ready access to coupling partners that can be mildly and selectively 
activated by nickel, yet are robust enough to withstand many conditions 
that would degrade or decompose the analogous triflate. 

Nickel has been known to activate C-O bonds since as early as 1977”, and. 
since this time, a number of advances in this field have been reported. For 
example, Kocienski and Dixon developed the first effective Ni(0)-catalysed 
cross-coupling of vinyl carbamates with organomagnesium reagents in 
19897", and several years later Snieckus and co-workers described the Ni(0)- 
catalysed cross-coupling of aryl carbamates and organomagnesium reagents 
in concert with further functionalization by directed ortho-metallation”. 
Subsequently, Snieckus also reported the Ni(0)-catalysed cross-coupling of 
tertiary aryl sulphamates with organomagnesium reagents”. 

Around this time, the field experienced an increased interest in the nickel- 
mediated activation of these ‘inert’ C-O bonds. A seminal development in 
the field came from Dankwardt™, who described the cross-coupling of aryl 
ethers (8) with arylmagnesium reagents (9) ina Kumada-—Corriu-type coup- 
ling (Fig. 4a). Another critical advance in this area of research came from 
Chatani and co-workers in 2008, when they reported the first Suzuki-Miyaura 
cross-coupling using aryl ethers (8) and boronic esters (10)”* (Fig. 4b). Pre- 
viously, all couplings of this type had employed Grignard reagents, which 
are generally poorly compatible with many common functional groups; the 
change to boronic esters, however, provided a considerable improvement 
to the substrate scope and allowed much easier implementation of this 
chemistry. 

In 2008, in nearly simultaneous reports, Shi and co-workers reported the 
Suzuki-Miyaura cross-coupling of aryl acetates and pivalates with borox- 
ines (13)*°, while Garg and co-workers reported the Suzuki-Miyaura cross- 
coupling of aryl pivalates (11) with boronic acids (12)”’ (Fig. 3c). Both meth- 
ods are capable of producing a wide variety of biaryls, demonstrating their 
viability as valuable alternatives to the use of traditional aryl halides and 
sulphonate esters. Following these initial reports, it was demonstrated that 
organozinc reagents” and aryl alkoxides”””° can be used, further expanding 
the range of available coupling partners. Additional advances were realized 
through a collaborative experimental and theoretical investigation carried 
out by Houk, Sneickus, Garg and co-workers”’. 
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Figure 4 | Milestones in cross-coupling reactions of aryl ethers and esters. 
a, Kumada—Corriu-type, nickel-catalysed biaryl formation from aryl ethers 
(8) and organomagnesium (Grignard) reagents (9). b, Suzuki-Miyaura-type, 
nickel-catalysed biaryl synthesis from aryl ethers (8) and boronic esters (10). 
c, Suzuki-Miyaura-type, nickel-catalysed biaryl synthesis using aryl esters (11) 
and aryl boronic acids (12) or aryl boroxines (13). Ph, phenyl; t-Bu, 
tert-butyl; Et, ethyl; cod, 1,5-cyclooctadiene; i-Pr, isopropyl; Mes, 2,4,6- 
trimethylphenyl. 
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In the period since these reports, many fruitful developments have been 
made to allow the transformation of many types of readily accessible phenol 
derivative into valuable biaryls*’, and other types of reaction beginning 
from phenol derivatives have since followed, such as the Mizoroki-Heck 
reaction reported by Watson and co-workers”’. In addition to C-C bond 
forming reactions, methods for the amination**” and reduction’*”* of phe- 
nol derivatives have been disclosed. 


Benzylic cross-coupling 

Another mode of bond activation that has come to prominence, at least in 
part thanks to nickel catalysis, is the activation of benzylic C-O bonds”. 
Benzylic ethers, esters, carbonates, carbamates and, in some instances, even 
free alcohols (via the corresponding magnesium alkoxide) can be activated 
by low-valent nickel. With proper choice of starting material and organo- 
metallic reagent, the reaction products can be di- or triarylalkanes, both of 
which are ubiquitous motifs in drug targets, natural products and materials 
applications. Perhaps the most significant feature of these transformations 
is their high stereospecificity, which allows access to these products in highly 
enantioenriched form from the (readily available) corresponding enantio- 
enriched mono- or diarylmethanol. 

In 2011, Jarvo and co-workers disclosed the first stereospecific nickel- 
catalysed alkyl-alkyl cross-coupling reaction*® (Fig. 5a). This method, in 
contrast to previous nickel-catalysed cross-couplings using sp” electrophiles, 
does not racemize the alkyl electrophile and in this instance provides clean 
inversion of the starting stereochemistry. Therefore, the existing stereochem- 
istry of the starting material (14) can be used as the only source of chiral 
information, rather than relying on catalyst control with or without a direct- 
ing group that must be later removed from the molecule. Jarvo demon- 
strated the synthesis of several interesting and useful molecules using this 
method, including the anti-cancer agent 16 from the corresponding methyl 
ether. The product was obtained with 96% enantiomeric excess (e.e.) begin- 
ning from material of 98% e.e., which demonstrates the high stereospe- 
cificity of this method. A subsequent publication demonstrated the use of 
the 2-methoxyethyl ether moiety as a directing/activating group, which 
greatly increases the ease with which oxidative addition into the C-O bond 
takes place”. 

Furthermore, in a subsequent publication, the Jarvo group demonstrated 
the Suzuki-Miyaura cross-coupling of benzylic esters, carbonates and car- 
bamates with arylboronic esters” (Fig. 5b). A striking feature of this method 
is that it affords retention (17a) or inversion (17b) of the starting stereo- 
chemistry based on the ligand employed—the use of tricyclohexylphosphine 
(PCy3) provides retention of stereochemistry, whereas using an N-heterocyclic 
carbene ligand (SIMes; Fig. 5b) provides inversion. In both cases, the enan- 
tiospecificity is greater than 97% across a wide range of substrates. Simul- 
taneously with the disclosure from the Jarvo group, Watson and co-workers 
published a closely related method for the synthesis of diarylalkanes and 
triarylmethanes beginning from benzylic pivalates (18) and boroxines (13)*" 
(Fig. 5c). In contrast to Jarvo’s method, however, it was found that no ex- 
ternal phosphine or carbene ligand was necessary to obtain good stereo- 
specificity—Ni(cod), alone was demonstrated to be an effective catalyst and 
provided inversion of configuration at the benzylic position. 

In 2012, Shi and co-workers reported the direct cross-coupling of benzylic 
alcohols (20) with Grignard reagents (21) to provide diarylmethanes or alkyl 
arenes with a catalyst system composed of (PCy3)2NiCl, and PCy; (ref. 42; 
Fig. 5d). The magnesium alkoxide, the active coupling partner, is pre-formed 
by addition of MeMgBr, which is also used to activate the Ni(1) precatalyst 
by reducing it to Ni(0) via a sequence of two successive transmetallations and 
reductive elimination of ethane. Subsequent addition of the Grignard reagent 
initiates the reaction to form the desired coupling product. 

Whereas all reactions so far described involve cleavage of C-O bonds, 
benzylic halides are also useful substrates for this type of activation. One 
clever example is from Martin and co-workers, who transformed benzylic 
halides to the corresponding phenylacetic acids using carbon dioxide as the 
carbon source*’. Subsequently, this methodology was extended to aryl and 
benzylic pivalates to synthesize benzoates and phenylacetic acids“. Additionally, 
the activation of benzylic ammonium salts has recently been demonstrated”. 
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Figure 5 | Reactions of benzylic alcohols and alcohol derivatives. 

a, Stereospecific methylation of benzylic ethers. A nickel catalyst comprising 
Ni(cod), and rac-BINAP was found to catalyse the methylation of benzylic 
methyl ethers (14) to form alkyl-substituted arenes (15). A modification for the 
synthesis of diarylethanes was also devised, allowing the synthesis of the 
anti-cancer agent 16 in 69% yield and 96% e.e. b, A Suzuki-Miyaura-type 
arylation of benzylic esters, carbonates and carbamates. The synthesis of 
triarylmethanes (17) can be achieved by catalytic Ni(cod), and PCy; or 
SIMes—the stereoselectivity (retention or inversion, respectively 17a or 17b) is 


n-BuzO/PhMe (1:3) 
60 °C, 24h 


Cross-coupling of aziridines 

The activation of C-N bonds, specifically those of aziridines, by zero-valent 
nickel has been known for a number of years**. This activation is facile, and, 
intriguingly, it can be reversed on exposure to molecular oxygen to again 
afford the original aziridine. However, it was not until a full decade later 
that this mode of activation was successfully incorporated into a catalytic 
coupling reaction. Doyle and co-workers succeeded in coupling alkyl and 
arylzinc halides (23) to styrene-derived N-tosyl aziridines (22) to yield aziri- 
dine ring opening with excellent specificity for addition at the more sub- 
stituted position of the aziridine (24) (Fig. 6a). Critical to the success of 
this approach was the use of dimethyl fumarate as an additive in place of 
the phosphine, carbene and/or amine ligands often used in nickel catalysts. 
Dimethyl fumarate is believed to accelerate reductive elimination through 
m-coordination to nickel. Alkyl-substituted aziridines, unfortunately, were 
found to be unsuitable substrates for this set of conditions. However, the use 
of the dual-purpose cinsyl group, which functions both as a removable pro- 
tecting group and a directing group, enabled the use of aliphatic aziridines 
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Figure 6 | Nickel-catalysed Negishi-type cross-coupling of aromatic and 
aliphatic aziridines. a, Nickel-catalysed addition of organozinc halides to 
styrenyl aziridines (22) occurs with incorporation of the nucleophile (23) at 
the substituted position of the aziridine to furnish B,B-disubstituted 
sulphonamides (24). b, Nickel-catalysed addition of organozinc halides (23) to 
aliphatic aziridines (25) directed by the cinsyl group (Cn, shown in dashed 
box), which imparts a preference for addition at the less substituted position of 
the aziridine. Ts, (4-methyl)phenylsulphonyl; DME, 1,2-dimethoxyethane; 
DMA, dimethylacetamide. 
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determined by the identity of the ligand. c, A phosphine- and carbene-free 
nickel catalyst was also developed, yielding inversion of the stereochemistry of 
benzylic pivalates (18) to provide access to diarylalkanes (19). d, Cross- 
coupling of free benzylic alcohols. An excess of organomagnesium reagent 
(21) can be added to form a magnesium alkoxide, which is then a competent 
coupling partner for the Kumada-type coupling with organomagnesium 
reagents. rac, racemic; BINAP, 2,2’-bis(diphenylphosphino)-1,1’-binaphthyl; 
SIMes, (1,3-bis(2,4,6-trimethylphenyl)-4,5-dihydroimidazol-2-ylidene); 

Bu, butyl. 


(25)**. In this way, good selectivity (2.5:1 to 4.9:1) for substitution at the 
less substituted position could be achieved with moderate to high yields 
(Fig. 6b). 


Cross-coupling of sp* halides 

The utility of cross-coupling of aryl or vinyl electrophiles is unquestioned 
with regard to scope, functional group compatibility, and relevance to the 
arene-rich pharmacophores ubiquitous in modern drug molecules. How- 
ever, the number of C(sp*)-C(sp°) linkages found in natural products and 
other complex organic molecules far outstrips the number of arene-carbon 
linkages. But C(sp* )-C(sp? ) bond forming reactions, particularly those involv- 
ing tertiary or quaternary stereocentres, can be challenging, even with mod- 
ern methods. In the past decade, a large amount of progress has been made 
towards an ultimate goal of the capability to form aliphatic carbon-carbon 
bonds stereoselectively at will, just as the formation of biaryl or aryl-heteroatom 
bonds has already been revolutionized by the field of cross-coupling. 

Some key challenges are associated with any cross-coupling in which the 
electrophile (that is, the component that undergoes oxidative addition to 
the metal catalyst) is sp* hybridized’”™. First, the activation energy for oxida- 
tive addition can be large, given that C(sp*)-X bonds are more electron-rich 
than C(sp”)-X bonds. For primary electrophiles, oxidative addition can 
proceed by an Sy2-like inversion pathway, but for secondary or tertiary elec- 
trophiles, this pathway is very slow*'. Then, once the carbon-metal bond 
has been formed, the challenge is to suppress intramolecular B-hydride elimi- 
nation, which would produce an alkene product. Transmetallation and/or 
reductive elimination, therefore, must proceed more rapidly than B-hydride 
elimination. Cases in which B-hydrogens are not available or not of the 
correct geometry for elimination, such as in benzyl or adamanty] electro- 
philes, as discussed previously, are also known but do not represent the 
same level of difficulty. 

Despite these obstacles, reactions of sp’ electrophiles were investigated 
in the early days of cross-coupling”. Suzuki reported the first Pd- or Ni- 
catalysed C(sp*)-C(sp’) cross-coupling reaction, primary alkyl iodides with 
alkyl boranes using Pd(PPh;),, but significant amounts of reduction and 
elimination products were also formed (reduction:elimination:desired 
coupling, 27:9:50)°*. Knochel then showed that nickel could be used to 
successfully couple primary iodides with organozinc reagents using either 
an intramolecularly tethered alkene™ or exogenous electron-poor alkene” 
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to facilitate reductive elimination. Kambe reported an olefin-assisted Kumada 
coupling of primary alky] halides and tosylates, proposed to proceed via a 
bis(1)’-allyl)nickel catalyst formed by the coupling of two equivalents of 
butadiene”®. 

However, the modern era of C(sp*)-C(sp?) cross-coupling began with a 
paper by Fu and co-workers in 2003, in which the coupling of secondary 
alkyl bromides with B-hydrogens and organozinc reagents was reported” 
(Fig. 7a). The transition from previously used primary electrophiles to sec- 
ondary electrophiles was ground-breaking, because it opened the way to 
asymmetric synthesis of tertiary or quaternary all-carbon stereocentres. The 
chelating tridentate PyBOX ligand (28) was found to be essential, probably 
in order to slow the rate of B-hydride elimination, which would require a 
vacant coordination site. In addition to the Negishi reaction, the cross- 
coupling of secondary electrophiles with aryl and vinyl boronic acids®, aryl 
silicon reagents” and aryl tin reagents” has been disclosed. Cross-coupling 
reactions with a variety of C(sp*)-metal reagents can also be accomplished 


a Ni(cod), (4 mol%) Me 
Me s-Bu-PyBOX, 28 (8 mol%) J. 
> 
ees Br + — BrZn—n-Nonyl DMA, RT, 20 h Mi n-Nonyl 
26 27 91% 
b NiClp # glyme (6 mol%) 
Br ligand 31 (8 mol%) Ph 
+ 9BBN7 Ph 
KOt-Bu (1.2 equiv.) 
29 30 i-BUOH (2.0 equiv.) 83% 
dioxane, RT 
c Boping (1.5 equiv.) 
Me Me NiBrp ¢ diglyme (10 mol%) Me. Me 
—_-—- 
Ph Br ligand 33 (13 mol%) Ph Bpin 
KOEt (1.4 equiv.) 
32 i-Pr20/DMA, -10 °C 80% 
d heey pr Ni-pincer34 (3 mol%) ‘abs Bi 
el ee a= SS Se ee 
DMA, -35 °C, 30 min 
+ n-BuMgCl 78% 
e Bees a n-Hex 
gS —_——a Ni-pincer 34 (5 mol%) . bate 
Cul (3 mol%), [Nal or n-Bu,Nl] n-Hex 
X=1, Br, Cl 3 Cs2COx, dioxane 100-140 °C 83-89% 
Se ae eee ae oe a Lopt rrr secre pr rrr rst rer cs cre 
l S 1 | ligand 31 i 1 Ni-pincer34 t 
f Tt iit f 
: fo) Zz (0) ae ia ' 
} N ( in i NMey | 
\_—-N N 1 MeN” it | 
1 oR R st NMez | N-Ni—Cl | 
7 F é La eee eee eS i : 
28, R= s-Bu Chelating ligand ! 
33, R =i-Pr supresses B-H ‘ NMe. | 
elimination { ' ' 
' 


Figure 7 | Key representative examples of cross-coupling reactions 
involving oxidative addition to sp’ carbon electrophiles. a, The first example 
of a C(sp’)-C(sp*) cross-coupling reaction using a secondary electrophile. 
Secondary bromides and iodides, such as 26, were coupled in a Negishi reaction 
with primary alkyl zinc reagents, such as 27. It was proposed the chelating 
PyBOX ligand (28) blocks the open coordination site needed for undesired 
B-hydride elimination. b, A mild (room temperature) Suzuki reaction of 
secondary bromides (29) and primary alkylboranes (30). Previously used 
bipyridyl or PyBOX ligands were unable to promote the transformation, so 
diamino ligand 31 was used. c, The first cross-coupling reaction to use an 
unactivated tertiary electrophile (32). In contrast to previous results, tertiary 
electrophiles reacted with faster rates than secondary or primary electrophiles. 
d, The Kumada coupling of primary alkyl bromides and iodides, and some 
secondary alkyl iodides, with Grignard reagents was accomplished with the 
Ni-pincer complex 34 with an amidobis(amine) ligand. Low temperatures 
allow a wide range of functional groups (such as ketones and esters) to be 
tolerated. e, The first example of a Ni-catalysed Sonogashira reaction of alkyl 
electrophiles. B-Hydrogen-containing alkyl iodides, bromides and chlorides 
could be used as the electrophile with a variety of terminal alkynes (35). 9-BBN, 
9-borabicyclo[3.3.1]nonane; pin, pinacolato; glyme, bis(2-methoxyethyl) ether; 
Pr, propyl; Hex, hexyl. 
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using this chemistry; for example, the mild Suzuki coupling carried out at 
room temperature shown in Fig. 7b*'. Finally, Fu and co-workers provided 
the first examples of the cross-coupling of a tertiary alkyl electrophile (for 
example, 32) with B,pin, (pin, pinacolato; Fig. 7c) and later in a Suzuki 
reaction®. In stark contrast, palladium has only been reported to cross-couple 
secondary sp’ electrophiles in a handful of specialized cases™. 

Preliminary mechanistic investigations of these reactions are consistent 
with an inner-sphere electron-transfer pathway for oxidative addition. For 
these coupling reactions, Fu proposes™® a Ni(1)/Ni(m) cycle with a radical 
oxidative addition pathway (Fig. 8a), although the Vicic group has demon- 
strated® that with redox-active ligands such as terpyridine, the transme- 
tallated species before oxidative addition is perhaps better thought of as 
Ni(1) cationic species 39 bound to reduced radical anion ligands (Fig. 8a, in 
square brackets). However, the first isolable and well-characterized Ni() 
complex arising from a Ni(1)/Ni(1) oxidative addition was also recently 
reported”. In any case, these mechanistic manifolds clearly demonstrate 
the ability of Ni to access various oxidation states in order to facilitate reac- 
tions otherwise inaccessible with group 10 metals. 

Another prominent contributor to the field of C(sp*)-C(sp’) cross-coupling 
has been Hu, who developed Ni-pincer complex 34”. This complex has proven 
catalytically active for C(sp*)-C(sp*) Kumada cross-couplings® (Fig. 7d) as 
well as the Sonogashira coupling ofalkynes with primary alkyl iodides, bro- 
mides and chlorides (Fig. 7e). Extensive mechanistic investigation has been 
conducted with the former reaction (Fig. 8b), revealing a number of inter- 
esting features”’. The active complex for the turnover-limiting step of trans- 
metallation appears to be the Ni-pincer complex coordinated to the Grignard 
reagent (34’). A bimetallic oxidative addition of the primary alkyl halide 
occurs via generation of an alkyl radical (43), which reacts with a different 
Ni(m)-alkyl complex (41), and then undergoes reductive elimination to form 
the product. The remaining unstable Ni(1) complex (45) transfers an elec- 
tron to a Ni(11) complex (42) to form two Ni(11) complexes (41, 46), which 
can rejoin the catalytic cycle. 

These examples of cross-coupling are impressive, but the promise of 
secondary C(sp*)-C(sp*) cross-coupling lies in asymmetric reactions pro- 
viding tertiary and quaternary stereocentres with high enantioselectivity. 
The Fu group has published numerous examples of this type of reactivity. 
These approaches rely on a tethered directing group that can coordinate to 
the nickel catalyst on oxidative addition to form a rigid complex such that, 
on reductive elimination, enantioenriched products are generated (Fig. 9). 
Because the electrophile probably undergoes a radical oxidative addition, both 
enantiomers of the starting alkyl halide are converted through a common 
planar radical intermediate. Therefore, a racemic alky] halide can be used to 
produce a highly enantioenriched product. 

The first example of an asymmetric secondary C(sp’)-C(sp”) cross-coupling 
was published in 2005: a Negishi reaction of «-bromo amides (47) with orga- 
nozinc reagents”' (Fig. 9a). The use of an (R)-PyBOX ligand (33) provided 
good yields and excellent enantioselectivities (>90% e.e. in nearly all cases). 
These conditions not only conferred good enantioselectivity, but also pro- 
vided selectivity for oxidative addition to the «-bromide in the presence of 
other primary or secondary alkyl bromides. 

Although amides are useful building blocks for further synthetic elab- 
oration, the need for a directing group does limit the generality of this 
approach. The Fu group has shown that a wide variety of directing groups 
can be used (Fig. 9b). Not only do various coordinating functional groups 
such as amides, esters, sulphones and alkenes work well in these reactions, 
but the tether length can vary as well. In fact, the Fu group has shown that 
an amide in conjunction with the chiral nickel catalyst can confer high 
levels of enantioselectivity even when the halide is in the 5-position, a full 
five atoms away from the amide nitrogen”. Doubtless, as it develops, the 
asymmetric cross-coupling reaction will become more and more general 
and be used for challenging bond formation in the synthesis of natural 
products and other more complex molecules”*”*. 


Reductive cross-coupling 
Traditional cross-coupling reactions join electrophilic (for example, aryl 
bromide) and nucleophilic (for example, aryl zinc) components. However, 
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Figure 8 | Proposed mechanisms for C(sp*)-C(sp’) cross-coupling. 

a, Mechanism proposed for Fu-type cross-coupling reactions in which ligands 
are typically PyBOX or similar as shown (36). A Ni(1) complex (37) undergoes 
transmetallation, then a radical oxidative addition of the electrophile, to 
eventually form a Ni(im) complex (38), which can reductively eliminate the 
coupled product. Nickel complexes with redox active ligands have been shown 
to be perhaps better thought of as the species in brackets (39, 40), in which the 
oxidative addition to the electrophile proceeds through ligand-centred rather 
than metal-centred redox. Ni(u) compounds are used as precatalysts, allowing 


nucleophilic reagents can be unstable, costly or cause undesired side reac- 
tions. In contrast, reductive cross-coupling reactions join two electrophilic 
components without the formation of a stoichiometric organometallic 
species. The challenge inherent in reductive cross-coupling processes is 
the differentiation between the two electrophiles in order to suppress homo- 
dimerization of either component. 

Recently, Weix and co-workers demonstrated an impressive solution to 
these difficulties by performing the first catalytic reductive cross-coupling 
to show high selectivities for cross-coupled products over dimerization 


for reactions to be set up outside the glove box. Reduction to Ni(1) presumably 
occurs before the beginning of the catalytic cycle via reduction of Ni(m) to Ni(0) 
(transmetallation/reductive elimination), then comproportionation of Ni(0)/ 
Ni(m). b, Mechanism proposed for the C(sp*)-C(sp°) Kumada cross-coupling 
with Ni-pincer 34. Extensive mechanistic studies have shown that a more 
complex bimetallic oxidative addition is operative, in which a Ni(II) complex 
bound to an equivalent of Grignard reagent (34’) is the active complex for the 
turnover-limiting transmetallation. SET, single electron transfer; R’s, alkyl 
radical; * indicates a coordination complex. 


without resorting to the use of a large excess of one of the components”* 
(Fig. 10a). By coupling an aryl iodide (49) and an alkyl iodide (50) electro- 
phile with Nil,:xH,O and both a bipyridyl and a phosphine ligand, as well 
as a stoichiometric manganese reducing agent, high reactivity and select- 
ivity for cross-coupled products were observed. Later, a range of aryl bro- 
mides and chlorides and alkyl bromides were shown to be reactive with zinc 
as the reducing agent”. Though Zn(0) is present, no organozinc species are 
formed; rather, zinc (or manganese) acts as a reductant directly to the nickel 
centre, accounting for the excellent functional group compatibility. 


a (e) | Directing groups O : ' Transmetallating agents ‘ 

Bn Et —_NiClyeglyme (10 mol%) : '' stkyl, aryl, alkenyl F 

Ny (R}(i-Pr)-PyBOX 83 (13 mol%) zB if be Me tBu Phe ed alkyl, aryl, alkeny 

47 Ph Br : ia H N e oh | n '' Zinc (Negishi), boron (Suzuki) ' 

j DMI/THF, 0 °C | ! ‘i oO ! Ph n=1,2 '! silicon (Hiyama), magnesium (Kumada), 

Racemic Ph n-Hex ! = O ie and zirconium reagents H 

% yi ! t-Bu | Str ttl ttt tlic t citi ttctcice y 

+ n-Hex—ZnBr ica. i 9 B gh WT soaehan ee ee YN \ 

ed 1 Bn Me ON oN | | Ligand classes ~~ : 

, 1 NC SN RZ | i 

b pa oR pacterteley DG. _R! a tae eee Ph 11 PyBOX =o (7 QO! 

+ MeR? gan Ph N 

i 4 i | D? 

7 un <s 7 N N 

Transmetallating ! s ie) i i! 2 i 

Racemic agent Enantioenriched ; pli see? v e Mex Me eB 

starting material product i N ss Be y ° li fe) oO ' 

A 5 ZA i xazoline r 

Da. _R' DG. UR" R + NON 

ag or . 7 , 

\ 1 i 1 

> | Re=Ni(u) - LY ENi(u) ‘ 9 fe) 1 R R 

) R271 Ne ! Jk Jie WW % i 1! Ar Ar ; 

Stereoconvergent Enantiodetermining 1 PhO N + p-Tol ~ NS ROS '{ 1,2-Diamine 

radical oxidative addition reductive elimination R b ' MeHN ‘NHMe ' 
Figure 9 | Asymmetric C(sp*)-C(sp*) cross-coupling reactions. a, The first oxidative addition is stereoconvergent, and thus the chiral ligand on nickel 


example of an asymmetric C(sp*)-C(sp’) cross-coupling between a racemic 
a-bromoamide (47) and aliphatic organozinc reagent. b, Proposed mechanism 
for the coupling reaction. A racemic aliphatic halide adjacent to a directing 
group (in large dashed box) undergoes oxidative addition. Since it is proposed 
to proceed via an unligated aliphatic radical (48) as shown in Fig. 8a, the overall 
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dictates the stereochemistry of the product. Many classes of directing groups 

and transmetallating reagents (dashed box at top right) have been successfully 
reacted, generally relying on one of the three chiral ligand classes shown in the 
dashed box at bottom right. Bn, benzyl; DMI, 1,3-dimethyl-2-imidazolidinone. 
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Figure 10 | Reductive cross-coupling reactions. a, The reductive cross- 
coupling reaction of an aryl halide (49) with an alkyl halide (50) without the 
intermediacy of an organozinc or organomanganese species. Extensive 
mechanistic studies have suggested that this method combines both polar (aryl 
halide) and radical chain (alkyl halide) formal oxidative addition mechanisms. 
Because the oxidation state of nickel is matched to each electrophile, 
homodimerization is suppressed. For details on possible methods of radical 
chain initiation, see ref. 77. b, First asymmetric acyl reductive cross-coupling. 
High enantioselectivities are obtained with bisoxazoline ligands such as 53. 
DMPU, 1,3-dimethyl-3,4,5,6-tetrahydro-2(1H)-pyrimidinone; DMBA, 
2,6-dimethylbenzoic acid; MS, molecular sieves. 


REVIEW 


After extensive mechanistic investigation, Weix proposed” the mech- 
anism in Fig. 10a. Nickel undergoes a polar oxidative addition to the aryl 
halide followed by a radical chain generation of an alkyl radical. This radical 
then undergoes coordination to form Ni(1) species 51 competent for reduc- 
tive elimination. In this way, the full potential of readily available nickel 
oxidation states (see above) is realized, and the use of alkyl electrophiles 
is allowed. Additionally, the properties of the halides are matched to the 
capabilities of the catalyst to avoid side reactions: aryl halides more readily 
undergo polar oxidative addition (Ni(0)/Ni(1)), and alkyl radicals are more 
stable than aryl radicals (Ni(1)/Ni(m)). Close comparison of this mechanism 
with the one shown in Fig. 8b also demonstrates that a number of complex 
elementary steps are available with radical pathways, and the factors that 
lead to certain patterns of reactivity have not been fully understood. 

The Reisman group has also developed an enantioselective reductive coup- 
ling that affords «,0-disubstituted ketones (52) from the coupling of acyl 
chlorides and secondary benzylic chlorides” (Fig. 10b). Although extensive 
mechanistic investigations have not been carried out, Reisman also pro- 
poses a mechanism in which Mn(0) acts as a direct reductant to Ni. In any 
case, further developments of asymmetric reductive cross-coupling would 
bea valuable alternative to the use of traditional transmetallating agents for 
cross-coupling. 


C-H activation 
C-H activation, the direct functionalization of a hydrocarbon, presents 
several potential advantages over traditional cross-coupling in that pre- 
functionalization of the substrate, for example with a halogen, is not neces- 
sary. At the same time, however, controlling site-selectivity with C-H activation 
can be very challenging, as molecules often contain many C-H bonds with 
similar chemical properties. Though examples of nickel-mediated C-H acti- 
vation date to at least 1963”, the development of catalytic C-H activation 
methods using nickel are more recent. 

One such example, from the research group of Itami, describes the decar- 
bonylative coupling of aryl and heteroaryl esters (55) and (benz)oxazoles 
(54)°°*! (Fig. 11a). This reaction provides an unusual means of arylating 
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Figure 11 | Selected examples of nickel-catalysed C-H activation reactions. 
a, Benzoxazoles and benzothiazoles (54) are useful substrates for this nickel- 
catalysed C-H activation reaction, which uses aryl esters (55) as the 
electrophilic coupling partners to produce (hetero)biaryls (57). This 
methodology was applied to the formal synthesis of muscoride A (58) to great 
effect. b, Nickel-catalysed, chelation-assisted C-H activation reactions have 


recently been developed. These reactions rely on a directing group to facilitate 
addition of nickel into the C-H bond in the ortho position of a benzamide 
(59, 61) or into the C-H bond of an adjacent aliphatic substituent (63). 
dcype, 1,2-bis(dicyclohexylphosphino)ethane; OTE, triflate 
(trifluoromethanesulphonate); DMF, dimethylformamide. 
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Figure 12 | Nickel-catalysed Heck reactions. a, Coupling of ary] triflates (64) 
with electron-rich enol ethers (65) to obtain high selectivity for branched 
products, which on acidic hydrolysis form ketones. Computational work 
supports a cationic Heck pathway with catalyst regeneration as turnover- 
limiting. b, The first Heck reaction highly selective for branched (br) products 
with electronically unbiased (aliphatic) and non-chelating alkenes (67). Again, 
it was proposed to proceed through a cationic Ni species (69; left dashed box) to 
give high regioselectivity, and an air-stable precatalyst (68; left dashed box) was 
developed to eliminate the need for an air-free technique. c, Branch-selective 
Heck reaction for aryl electrophiles (70) with aliphatic olefins (67). Bidentate 
ligand 71 (right dashed box) was key to both reactivity of aryl electrophiles 
and suppression of undesired isomerization. Aryl chlorides and other 
phenol-derived electrophiles can be used with the addition of TESOTf, which 
is proposed to perform a counterion exchange in order to enter the cationic 
Heck pathway. TESOTF, triethylsilyl trifluoromethanesulphonate; br/In, 
branched-to-linear product ratio; DABCO, 1,4-diazabicyclo[2.2.2]octane; r.r., 
regioisomeric ratio—the ratio of desired product to all other isomers. 


azoles, and adds to the existing decarbonylative/decarboxylative approaches 
for C-H arylation. The yields of the isolated products are good to excellent 
across a wide range of substituted esters and azoles, and, furthermore, the 
air-stable precatalyst Ni(dcype)(CO), (56) can be used in place of Ni(cod), 
and dcype, making this reaction operationally simple (dcype, 1,2-bis(dicy- 
clohexylphosphino)ethane). Itami used this method to execute an expe- 
dient formal synthesis of muscoride A (58) in excellent yield. Furthermore, 
a subsequent report from Itami and co-workers describes the isolation of 
the key arylnickel(11) pivalate intermediate formed in couplings of this type. 

In 2011, Chatani and co-workers reported the first example of a nickel- 
catalysed, chelation-assisted C-H activation reaction®. The reaction, an oxida- 
tive cycloaddition of alkynes to aromatic amides to form 1-isoquinolones 
60 (Fig. 11b), relies on a 2-pyridylmethyl group on the amide nitrogen to 
function as a directing group. Subsequently, in 2013, Chatani and Tobisu 
published the ortho C-H activation of a similar aryl amide system** (61) 
and the arylation of aliphatic C-H substituents* (62), further expanding 
the repertoire of nickel-catalysed C-H activation reactions. All three reac- 
tions are hypothesized to occur by pre-coordination of nickel to the 1,2- 
diamine moiety of the directing group, followed by metallation to form 62, 
after which the reaction paths diverge. 


Heck reaction 

Many of the advantages of nickel catalysis can also be applied to the Heck 
reaction®*. The Heck reaction is similar to cross-coupling reactions, but rather 
than undergo transmetallation, the oxidative addition complex coordinates 
an alkene. Subsequent migratory insertion and [-hydride elimination then 
furnishes a more substituted alkene. Computational work comparing Ni 
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Figure 13 | Prototypical reductive coupling reactions and use of new 
reducing agents. a, Standard reductive coupling reaction. Oxidative 
cyclization of two m-components forms a nickellacycle (72), which on 
formation of a Ni-H bond with a reducing agent, undergoes reductive 
elimination to form a new C-C o-bond and a new C-H bond overall 

(73). b, Use of methanol as a mild reducing agent via the intermediacy of a 
hemiacetal. c, Use of isopropanol as a mild, external reducing agent, allowing 
for the use of air-stable Ni(11) salts as precatalysts. IPr, 1,3-bis(2,6- 
diisopropylphenyl)-1,3-dihydro-2H-imidazol-2-ylidene. 


and Pd for use in the Heck reaction has been carried out, with Guo and co- 
workers proposing lower energy barriers for oxidative addition and migratory 
insertion for Ni in contrast to faster B-hydride elimination and catalyst 
regeneration for Pd (ref. 15). However, relatively little has been done to 
truly capitalize on these features until quite recently. 

Questions of regioselectivity are critical in the Heck reaction, because in 
theory migratory insertion can occur at either end of the alkene coupling 
partner. Traditionally, electron-poor alkenes such as styrenes and acrylates 
have been used, which confer high selectivity for addition to the terminal 
position of the olefin. Conditions under which a cationic Pd or Ni species is 
formed by dissociation of the halide or triflate component, rather than the 
phosphine, after oxidative addition can provide high selectivity for electron- 
rich alkenes*’”. An example of the latter using Ni catalysis is given by Skryd- 
strup and co-workers, who demonstrated good selectivity for coupling of 
aryl triflates (64) and enol ethers (65), which following subsequent hydro- 
lysis formed methyl ketones (66)** (Fig. 12a). 

However, electronically unbiased olefins had always given mixtures of 
branched and linear product isomers arising from addition of the elec- 
trophile to the internal and terminal positions of the alkene, respectively. 
In 2011, Jamison and co-workers reported the first highly selective Heck 
reaction for these olefins including ethylene and terminal aliphatic alkenes 
(67)*° (Fig. 12b). This reaction is proposed to proceed via a cationic Heck 
pathway; it is suggested that the shorter Ni-ligand bond lengths make 
steric differentiation between the H and alkyl substituents of the alkene 
feasible. Later, an air-stable nickel precatalyst (68) was developed for this 
reaction, which obviates the need for Ni(cod), (and therefore the use of a 
glove box) and increases reaction rates”’. Finally, a branch-selective Heck 
reaction of the more typically used aryl electrophiles (70) with electroni- 
cally unbiased olefins has recently been reported”! (Fig. 12c). A wide range 
of electrophiles including aryl triflates, chlorides and other less reactive sul- 
phonates (mesylates, tosylates, sulphamates) can be used, underscoring the 
ability of nickel to undergo oxidative addition to a broad range of cheap, 
stable, traditionally unreactive electrophiles. 


Reductive coupling 


In addition to cross-coupling, one of the prototypical nickel-catalysed reac- 
tions is reductive coupling. Reductive coupling involves the joining of two 
m-components with a reducing agent to form a new o-bond between the 
coupling partners and a new C-H o-bond arising from the reducing agent 
(Fig. 13a). The reaction is generally accepted to proceed via a concerted 
oxidative cyclization, followed by either coordination of the reducing agent 
to Nior o-bond metathesis to provide a nickel hydride, and terminated by 
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Figure 14 | Two strategies for regiocontrol in reductive coupling reactions. 
a, A tethered alkene can be used to control regioselectivity in coupling an 
alkyne with an aldehyde. The alkyne oxidatively adds to Ni to produce 
nickellacyclopropene 76. Then, by selective displacement by added phosphine 
ligands, or with no additive, the binding and orientation of the aldehyde is 
controlled to produce either linear (A) or branched (B) products. b, The steric 
profile of carbene ligands (small substituents, S, compared to large substituents, L) 


reductive elimination. Several excellent reviews’ of the state of the field before 
2005 exist including couplings of both alkenes” and alkynes”. 

In more recent years, there have been two key developments in this field. 
The first development involves the search for milder reducing agents. Tradi- 
tionally, reactive hydride donors such as alkyl silanes or trialkyl boranes 
have been used. However, in 2008, Montgomery and co-workers reported 
that methanol could be used to facilitate B-hydride elimination and provide 
the nickel hydride directly from an aldehyde in internal redox to form 74” 
(Fig. 13b). A few years later, Jamison and Beaver reported that by using 
alcohols directly as the source of the requisite hydride, air-stable Ni(1) salts 
(reduced under the reaction conditions) could be used to carry out reduct- 
ive couplings between epoxides and tethered alkynes (75) (Fig. 13c). 

The second development has addressed a major challenge inherent in 
reductive coupling reactions: the question of regioselectivity (Fig. 14). For 
strongly electronically biased coupling components, such as aryl-alkyl alkynes, 
regioselectivity is controlled by these elements. However, dialkyl alkynes pose 
a much tougher problem. The first high levels of regiocontrol were achieved 
by the use of alkynes with a pendant alkene, which could act as a ligand to Ni 
after coordination to the alkyne”® (Fig. 14a). Then, by adjusting the ligand 
sphere around Ni by the addition of phosphine ligands, either position of 
the alkyne (A or B in Fig. 14a) could undergo reaction with the aldehyde”. 
Montgomery and co-workers subsequently discovered that by alteration of 
the steric bulk of carbene ligand substituents, the selectivity in the reductive 
coupling of internal dialkyl alkynes and aldehydes could be reversed” (Fig. 14b). 
Subsequent computational studies suggested that the orientation of the 
groups bound to nitrogen was important, particularly in filling the quad- 
rant proximal to the alkyne below the plane of the Ni-ligand bond”. These 
impressive results demonstrate the fine-tuned control over these systems 
that can now be achieved. 


Looking forward 

Although the field of nickel catalysis has rapidly expanded over the last decade, 
there are many challenges that remain to be overcome. Through extensive 
mechanistic studies, including those described above, researchers now under- 
stand a great deal more about the elementary steps and oxidation states of nickel 


can be used to control the regioselectivity of an alkyne-aldehyde reductive 
coupling. Computational work suggests that unfavourable steric interactions 
between the groups on the alkyne with either the group on the aldehyde or the 
groups on the ligand dictate the orientation of the forming five-membered 
nickellacycle intermediate (shown in brackets). EtOAc, ethyl acetate; 


Cyp, cyclopentyl. 


in a number of varying reaction manifolds. In many instances, the picture 
presented is more complex than originally envisioned, often due to the easy 
access of nickel to multiple oxidation states, and thus catalytic pathways. 
With this increased understanding, we expect future efforts to be directed 
towards the design of new catalysts and the development of new transfor- 
mations that accomplish even more complex bond-forming reactions. The 
activation of simple electrophiles for cross-coupling, such as C-H bonds or 
phenol derivatives, the formation of bonds difficult to access with current 
methodology, such as carbon-fluorine bonds, and the use of coupling partners 
currently considered challenging, such as carbon dioxide (CO3), are all areas 
that will benefit from further development and mechanistic understanding. 

In addition, we expect to see further developments in the area of C(sp*)- 
C(sp*) bond formation, particularly in expansion of substrate scope and 
application to the synthesis of complex molecules. Although there are a few 
examples of the cross-coupling of sp’ electrophiles used in total synthesis, 
the adjustment of synthetic strategy has not perhaps yet occurred to enable 
their use as a foundational tool in organic chemistry. Another conspicuous 
absence is a general method for the Heck reaction of sp’ electrophiles'™. 
This reaction is particularly challenging because B-H elimination of the 
electrophile before coupling must be suppressed, but in order to form the 
desired alkene product, the measures taken to do so must not prohibit a 
B-H elimination after C-C bond formation. 

Finally, we expect great strides in the development of low-cost, air-stable, 
and easier-to-handle sources of nickel for catalysis. The use of the most com- 
mon Ni(0) source, Ni(cod)., requires the use of a glove box, and although 
in some reactions inexpensive nickel halide sources can be used with sub- 
sequent reduction accomplished in situ, the development of different modes 
of activation for nickel precatalysts could lead to wider adoption of nickel 
catalysis, in both academic and industrial laboratories. All in all, we expect 
that nickel catalysis will prove a fertile field of study well into the future 
as chemists continue to address even more challenging problems of reac- 
tivity and rapid assembly of complex molecules. We also hope that nickel 
will continue to gain recognition, not as an inexpensive substitute for 
palladium, but rather as possessing a number of inherent properties that 
provide a complement to catalysis by other metals. 
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Nucleotide signalling during inflammation 


Marco Idzko', Davide Ferrari* & Holger K. Eltzschig? 


Inflammatory conditions are associated with the extracellular release of nucleotides, particularly ATP. In the extracellular 
compartment, ATP predominantly functions as a signalling molecule through the activation of purinergic P2 receptors. 
Metabotropic P2Y receptors are G-protein-coupled, whereas ionotropic P2X receptors are ATP-gated ion channels. Here 
we discuss how signalling events through P2 receptors alter the outcomes of inflammatory or infectious diseases. Recent 
studies implicate a role for P2X/P2Y signalling in mounting appropriate inflammatory responses critical for host defence 
against invading pathogens or tumours. Conversely, P2X/P2Y signalling can promote chronic inflammation during ischae- 
mia and reperfusion injury, inflammatory bowel disease or acute and chronic diseases of the lungs. Although nucleotide 
signalling has been used clinically in patients before, research indicates an expanding field of opportunities for specifi- 
cally targeting individual P2 receptors for the treatment of inflammatory or infectious diseases. 


tion as a universal energy currency’. Interestingly, ATP has a com- 

pletely different role in the extracellular compartment, where it 
functions as a signalling molecule through the activation of nucleotide 
receptors’. These receptors are referred to as purinergic P2 receptors. In 
contrast to P1 receptors, which are activated by the ATP metabolite aden- 
osine, P2 receptors are activated by ATP and/or other nucleotides (for 
example, UTP). On the basis of their signalling properties, P2 receptors 
can be further subdivided into metabotropic P2Y receptors (P2YRs) that 
are G-protein-coupled, and ionotropic P2X receptors (P2XRs) that are 
nucleotide-gated ion channels*. Although P2 receptors were originally 
described on the basis of their functional role in the central nervous system™, 
more recent studies demonstrate their widespread expression throughout 
different tissues (Supplementary Table 1) and implicate them in innate or 
adaptive immune responses”®. 


N ucleotides—particularly ATP—are well known for their func- 


Cellular ATP release during inflammatory conditions 


During certain conditions—for example inflammatory, ischaemic and 
hypoxic—several cell types release ATP from intracellular storage pools 
into the extracellular compartment***. Although ATP release can occur 
in an uncontrolled fashion (for example, during necrosis), many studies 
have examined molecular pathways that control extracellular ATP release’. 
For example, inflammatory cells can release ATP via pannexins or con- 
nexin hemichannels”’. Pannexins—transmembrane protein channels that 
connect the intracellular with the extracellular space—have been impli- 
cated in the release of ATP from apoptotic cells*, and other studies have 
implicated connexins in extracellular nucleotide release*’. Connexins were 
originally described as gap junction proteins consisting of two hemichan- 
nels. However, isolated hemichannels (connexons) can function as conduits 
between the cytoplasm and the extracellular space, thereby controlling ATP 
release, for example from inflammatory cells’ or vascular endothelia’. Other 
studies found that the release of uridine nucleotides such as UTP, UDP 
and UDP-glucose are increased during cystic fibrosis'®. Together, these 
studies indicate that inflammatory disease conditions are associated with 
the extracellular release of nucleotides. 


Molecular structure and signalling cascade of P2YR 
P2YR belongs to the G-protein-coupled receptor (GPCR) family and con- 
tains an extracellular amino terminus, an intracellular carboxy terminus and 


seven transmembrane-spanning motifs (Fig. 1). At present, eight distinct 
mammalian P2YRs have been cloned and characterized (P2Y j/2/4/6/11/12/13/14R)- 
The missing numbers represent either non-mammalian receptors (P2Y3R 
is the chicken orthologue of human P2Y,R) or other GPCRs that share 
some sequence homology with P2YRs but cannot be activated by nucleo- 
tides (for example, lysophosphatidic acid is a P2Y)R agonist)’. According to 
their phylogenetic and sequence divergence, two distinct P2YR subgroups 
have been proposed. The first group includes the P2Y}/2/4/6/11R subtypes, 
with a sequence homology of 35-52% in amino acid composition and the 
presence ofa Y-Q/K-X-X-R defining motif in the transmembrane «helix 
7, thus affecting ligand-binding characteristics. The second group contains 
P2Y12/13,14R; with members sharing a sequence homology of 47-48% and 
the presence of the K-E-X-X-L motif in transmembrane «helix 7 (ref. 11). 
There is some evidence suggesting that the two P2YR subgroups also differ 
in their primary coupling to G proteins: the P2Y1/2/4/6/11R group is coupled 
to Gg/Gy, (leading to calcium release via phospholipase C/inositol-1,4,5- 
triphosphate activation). By contrast, P2Y}2/13/14R bind to Gio proteins, 
which inhibit adenylate cyclase and modulate flow through ion channels'?”’. 
However, there are several instances in which other signalling pathways 
have been identified. For example, the discrepancy between structural- 
group affiliation and functional characteristics is highlighted by P2Y.R. 
Despite having only 20-25% sequence homology with P2Y j/2/46R”, there 
is considerable functional similarity. Indeed, P2Y, and P2Y,R can both 
activate monomeric G-proteins (such as Rac and/or RhoA)"’, and are the 
only P2YR subtypes that exhibit agonist-induced desensitization through 
GPCR kinases". These studies indicate that despite some sequence homol- 
ogy among P2YRs, there are marked differences between individual mem- 
bers of the P2YR family regarding their intracellular signalling cascades. 


Endogenous ligands for P2YR 

The most abundant and best-characterized endogenous ligand for P2YR 
is the nucleotide ATP. ATP binds to all P2YRs except P2Y,R and P2Y,4R”. 
Its binding characteristics exemplify the complexity of P2YR signalling: at 
low concentrations it is the only native agonist for P2Y,,R, but at higher 
concentrations it functions as a partial agonist for P2Y,R and P2Y,3R, or as 
an antagonist for human P2Y,R or P2Y,,R'"'*"*. Other nucleotides, such 
as ADP, UTP, UDP or UDP-glucose, exhibit more specificity for individual 
P2YRs. For example, ADP activates P2Y,R, P2Y,2R and P2Y,3R, whereas 
UTP primarily binds to P2Y,Rand P2Y,4R, and toa lesser extent to P2Y¢R, 
for which UDP is its preferred native ligand. P2Y,4R is predominantly 
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Figure 1 | Extracellular nucleotide release and signalling during 
inflammation. During inflammation, multiple cell types release nucleotides, 
for example ATP or ADP, from their intracellular compartments into the 
extracellular space. Nucleotides can be released during mechanical injury, 
necrosis, apoptosis or inflammatory cell activation. Several molecular pathways 
have been implicated in this process, such as vesicular ADP release from 
platelets, pannexin-mediated ATP release during apoptosis, and connexin- 
or pannexin-mediated ATP release from inflammatory cells, such as 
neutrophils. Extracellular nucleotides function as signalling molecules through 
the activation of purinergic P2 receptors. These receptors can be grouped 
into metabotropic P2Y receptors (P2YRs; GPCRs with seven 
transmembrane-spanning motifs) or ionotropic P2X receptors (P2XRs), 
which are nucleotide-gated ion channels. Each P2XR is formed by three 
subunits (P2XR monomers), each of which consists of two transmembrane 
regions, TM1 and TM2. Binding of three molecules of ATP to the assembled 
P2X channel causes opening of a central pore. These conformational changes 
allow for flux of ions such as sodium (Na*), calcium (Ca”*? and potassium 
(K*) across the membrane. ATP signalling is terminated by the enzymatic 
conversion of ATP to adenosine through the ectonucleoside triphosphate 
diphosphohydrolase CD39 (conversion of ATP/ADP to AMP) and the 
ecto-5’-nucleotidase CD73 (conversion of AMP to adenosine). Similar to ATP, 
adenosine (A) functions as an extracellular signalling molecule through the 
activation of purinergic P1 adenosine receptors. 


activated by UDP-glucose and other UDP-sugars, and to a lesser degree 
by UDP""*">°, Indeed, the capacity of different nucleotides to bind spe- 
cifically to individual P2YRs, or to act as either agonists or antagonists, 
highlights the complexity of the P2Y system and suggests non-redundant 
signalling pathways. 


Pharmacological compounds that act on P2YR 

Owing to the fact that P2YRs have crucial roles in regulating immune 
responses, they became an obvious pharmacological target for the treat- 
ment of inflammatory or infectious diseases. Interestingly the parasiticide 
suramin, which was widely used in the 1920s for the treatment of human 
onchocerciasis and trypanosomiasis'”"*, was later found to be a nonspecific 
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inhibitor of P2YR and P2XR'""*. Although many P2YR-subtype-specific 
agonists or antagonists have been characterized in in vitro assays or animal 
studies of inflammatory disorders''’*"*, presently only two types of P2YR- 
specific compounds are used in patients: antithrombotic P2Y,R antago- 
nists (for example, clopidrogel) and the P2YR agonist denufosol, which 
was examined for the treatment of cystic fibrosis, but eventually failed in 
clinical trials'’. One of the important future challenges for targeting PYR 
signalling in patients will include the development of highly selective P2YR 
antagonists, or specific combined P2R antagonists (for example, a P2Y,/ 
P2Y,/P2X,R antagonist), which could be used for the treatment of chronic 
inflammatory disorders. 


Functional roles of P2YR in unchallenged mice 

Mice with genetic deletions for human P2YR homologous genes have 
been generated and characterized, with the exception of P2RY11, which 
is not expressed in mice’*. Despite their widespread expression and their 
functional involvement in many diseases, mice with global deletions for 
individual P2YRs display only mild phenotypical alterations when main- 
tained unchallenged in a germ-free environment. For instance, P2ry2 /~ 
mice have slightly lower plasma concentrations of aldosterone, renin and 
potassium”, whereas global deletion of P2ry4 is associated with lower exer- 
cise capacity and reduced myocardial hypertrophy during a swimming 
exercise’'. These findings indicate the likelihood of some redundancy in 
the signalling system, or compensatory mechanisms following global P2ry 
gene deletion. 


P2YR signalling during inflammatory disease states 


Several studies over the past decade have highlighted fundamental roles 
for P2YRs during inflammatory and infectious diseases. Particularly, sig- 
nalling events through P2Y2/6/;2R have shaped an ambivalent view of their 
function as either friend or foe during inflammation. 


P2Y2R 

An early attempt at targeting P2Y signalling for the treatment of inflam- 
matory disorders came from studies of P2Y,R agonists for the treatment 
of cystic fibrosis**”’. Cystic fibrosis is a life-shortening disease that affects 
over 30,000 children and adults in the United States”. The airways of patients 
with cystic fibrosis are susceptible to infection, characterized by neutro- 
philic inflammation. Although neutrophil proteases are critical for killing 
engulfed bacteria, neutrophil elastase accumulates in the airways of cystic 
fibrosis patients, impairing ciliary function, crippling bacterial clearance 
and degrading structural proteins”. From a molecular perspective, cystic 
fibrosis is characterized by a defect in the cystic fibrosis transmembrane 
conductance regulator gene, causing hyperabsorption of sodium leading 
to thickening of mucus, reduced mucociliary clearance and concomitant 
increases in susceptibility to bacterial infection**”’. Several studies have 
indicated that P2Y,R agonists can induce chloride secretion through inhi- 
bition of the epithelial sodium channel ENaC, activation of calcium- 
dependent chloride channels”, stimulation of mucin production, surfactant 
secretion and ciliary beating (Fig. 2)*°. These observations were followed 
by the development of the P2Y,R agonist denufosol for the treatment of 
patients suffering from cystic fibrosis”. In 2005 denufosol entered clinical 
trials, and a 28-day intervention study in a small cohort indicated a poten- 
tial benefit in lung function of cystic fibrosis patients”. Unfortunately, long- 
term follow-up (48 weeks) in 466 patients was not associated with improved 
pulmonary function or reduction of pulmonary exacerbations’’. These 
disappointing findings may be related to an inflammatory role for P2Y,R 
signalling (see below)’. 

In addition to a means for enhancing mucociliary clearance, P2Y,R ago- 
nists have been implicated in the promotion of wound healing”. In this 
context, P2Y.R signalling mediates the recruitment of leukocytes to the 
site of tissue damage as well as differentiation and proliferation of struc- 
tural cells. Moreover, ATP release and concomitant P2Y,R signalling has 
been identified as a ‘find-me’ signal for leukocytes, promoting phagocytic 
clearance of apoptotic cells or bacteria by macrophages” and neutrophils**”', 
thereby contributing to the resolution of inflammation (Fig. 2). 
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Figure 2 | P2Y.R signalling during injury resolution and chronic 
inflammation. P2Y,R signalling on phagocytes, such as macrophages, 
contributes to the clearance of apoptotic cells, which release the P2Y,R agonist 
ATP as a ‘find-me’ signal. P2Y,R-mediated clearance of apoptotic cells 

and debris contributes to wound healing. Activation of P2Y,R by UTP or 
ATP promotes mucociliary clearance in the airways via inhibition of the 
epithelial sodium channel (ENaC), which is associated with concomitant 
increases in mucin production, surfactant-secretion and ciliary beating. 
Neutrophil-dependent ATP release and autocrine activation of P2Y,R 
contributes to purinergic chemotaxis, thereby enhancing bacterial clearance 
during pneumonia. On the other hand, P2Y,R-mediated release of IL-8 and 
neutrophil elastase (NE) from neutrophils contributes to the pathogenesis of 
chronic obstructive lung disease (COPD). ATP-elicited P2Y,R signalling on 
alveolar epithelial cells or eosinophils causes production of pro-allergic 
mediators (for example, IL-33, IL-8, eosinophil cationic protein) during allergic 
airway disease. Similarly, P2Y,R signalling on dendritic cells has a role during 
the induction and self-perpetuation of asthma. 
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Other studies have indicated that P2Y,R signalling contributes to fun- 
damental leukocyte functions such as migration and mediator production 
by neutrophils, eosinophils, dendritic cells (DCs) and macrophages**”’. 
For example, migrating neutrophils can release ATP from their leading 
edge to amplify chemotactic signals and direct cell orientation by feedback 
signalling involving P2Y>R (Fig. 2)°". In addition, danger signals such as 
uric acid, complement factor 5a, Toll-like receptor ligands and interleu- 
kin (IL)-8 are stimulated by an autocrine ATP-P2Y,R loop to modulate 


migration and cytokine production of neutrophils or eosinophils”*!**"**. 
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Several studies point towards an ambivalent function of P2Y,R in neutro- 
philic inflammation: although P2ry2 ’~ mice are less capable of contain- 
ing bacterial infections*””’, inappropriate activation of P2Y,R is associated 
with neutrophil-induced hyperinflammation and tissue damage during 
sepsis, chronic lung disease and hepatitis (Fig. 2)**"*'. For example, neu- 
trophils from patients with chronic obstructive pulmonary disease express 
higher levels of P2Y,R, which is associated with higher elastolytic activity 
and migration capacity upon ATP stimulation compared to healthy con- 
trols*®. In the context of P2Y,R agonists for the treatment of cystic fibrosis, 
these findings could explain why the long-term use of inhaled denufosol in 
cystic fibrosis patients failed to improve clinical outcomes, as it may have 
been associated with enhanced neutrophil activation and increased lung 
inflammation, thus overcoming the beneficial effects of improved muco- 
ciliary clearance. 

Studies in murine models of asthma or contact hypersensitivity dem- 
onstrate a contribution of P2Y,R to the induction and self-perpetuation 
of allergic diseases. Allergen provocation leads to ATP release and concom- 
itant signalling through P2Y>R, thus favouring the recruitment of imma- 
ture myeloid DCs and eosinophils to the site of allergen exposures’”*. This 
is associated with the production of pro-allergic mediators (for example, 
IL-33, IL-8, eosinophil cationic protein) from different cellular sources 
(Fig. 2)°**°. Similarly, studies in humans indicate that P2Y,R-induced 
migration and production of reactive oxygen species are enhanced in imma- 
ture monocyte-derived DCs and eosinophils from allergic donors in the 
context of concomitant increases in P2Y,R expression’. 

Taken together, these findings suggest that ATP-elicited activation of 
P2Y,R can function as a ‘friend’ in the defence against bacterial infections, 
promotion of wound healing or in enhancement of mucociliary clearance 
mechanisms. By contrast, it can also lead to uncontrolled inflammation, 
attenuated resolution, promotion of chronic inflammatory disease states 
and fibrotic remodelling”. Indeed, P2Y,R antagonists—as opposed to P2Y,R 
agonists—could evolve as useful drugs for the treatment for chronic inflam- 
matory diseases, such as chronic obstructive pulmonary disease (Fig. 2)4°"". 


P2Y,¢R 

P2Y,Ris highly expressed on stromal cells and can be activated by UDP**. 
The ambivalent behaviour (‘friend or foe’) of P2Y,R signalling during 
inflammatory diseases also applies to P2Y,R signalling. Recent in vivo 
studies demonstrate a role of P2Y¢R in innate immune responses against 
bacterial infection”. P2Y,R activation triggers chemokine release from mono- 
cytes, DCs, eosinophils and endothelial cells, thus promoting recruitment 
of inflammatory cells towards the site of inflammation or infection'’*°!>. 
Similarly, one study demonstrated that injured neurons release UTP and 
UDP, causing the upregulation of P2Y¢R expression on microglia, and 
concomitant enhancement of their phagocytic capacity for dying cells”. 
UDP signalling through P2Y,R can therefore function as an ‘eat-me’ sig- 
nal for microglia, thereby initiating the clearance of dying cells or debris 
in the central nervous system. 

By contrast, P2Y,R signalling is detrimental in models of endothelia 
or epithelial” inflammation. Mucosal P2Y,R expression is increased dur- 
ing experimentally induced intestinal inflammation such as occurs during 
inflammatory bowel disease (IBD)*’. IBD is a heterogeneous group of dis- 
orders characterized by intestinal inflammation, including Crohn’s disease 
and ulcerative colitis. Here, pharmacological inhibition or genetic deletion 
of P2ry6 in murine models of intestinal inflammation is associated with 
improved disease outcomes”. Similarly, a functional role for P2Y¢R sig- 
nalling in promoting detrimental inflammation has been reported for 
chronic forms of lung disease, such as asthma“. Indeed, its functional role 
in promoting pathological airway inflammation during chronic lung dis- 
ease is an important concern regarding potential considerations for the 
use of P2Y,R agonists for the treatment of cystic fibrosis as a means towards 
enhancing mucociliary clearance”. The idea that UDP-elicited P2Y.R 
activation can lead to self-perpetuating chronic inflammatory disorders 
is further supported by recent in vivo findings suggesting a functional role 
of P2Y.R in promoting atherosclerotic disease in murine models”. Taken 
together, these studies indicate that although activation of P2Y,R is important 
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in initiating innate immune responses after infection, inappropriate P2Y.R 
signalling, predominantly on stromal cells, can drive detrimental immune 
responses in chronic inflammatory disorders such as atherosclerosis, chronic 
lung disease or IBD. 


P2Y))R 

P2Y,2R, which is highly expressed on platelets, has fundamental roles in 
platelet activation and aggregation. Stimulation of P2Y,2R inhibits ade- 
nylyl cyclase activity and increases phosphatidylinositol-3 kinase activity, 
resulting in the activation of the fibrinogen receptor (integrin «IIb B3), 
which is critical for platelet aggregation®. P2Y,.R antagonists have been 
used successfully for antithrombotic therapy in patients*. Because plate- 
lets are a key source of inflammatory mediators, P2Y,R signalling has also 
been implicated in modulating inflammatory responses”. The fact that 
P2Y,2R agonists trigger mediator release from platelets implicates inflam- 
matory alternations in patients taking P2Y,,R antagonists. Importantly, 
P2Y,,R antagonists such as clopidogrel or ticagrelor are clinically used in 
patients as platelet inhibitors. Indeed, reduced levels of circulating inflam- 
matory mediators (for example, tumour-necrosis factor-o., C-reactive pro- 
tein, P-selectin) were found in patients receiving clopidogrel**. Preclinical 
studies confirm the proinflammatory role of PY,.R signalling in models 
of vascular inflammation and asthma. For instance P2ry12 ’~ miceare pro- 
tected in models of atherosclerosis. Moreover, a surprising crosstalk in- 
volving leukotrienes and P2Y,,R has been described during asthma. In brief, 
murine studies demonstrate that P2Y,,R signalling on platelets is required 
for the pro-asthmatic action of leukotriene LTE4 (ref. 61). Furthermore, 
platelet-independent P2Y,,R signalling events contribute to asthma, as 
P2Y,,R antagonists can directly block cysteinyl leukotriene-induced release 
of eosinophil cationic protein from human eosinophils”, and ADP-elicited 
P2Y,,Ractivation enhances the capacity of DCs to activate allergen-specific 
T cells®’. The clinical observation that single nucleotide polymorphisms 
of the P2RY12 gene are associated with altered lung function in a cohort 
of asthmatic children provides additional evidence for a role of P2Y>R sig- 
nalling in human asthma™. Taken together, these studies implicate P2Y,2R 
signalling in promoting chronic inflammatory disorders such as asthma 
and atherosclerosis. However, additional clinical trials addressing the clin- 
ical efficacy of P2Y,R antagonists (such as clopidogrel or ticagrelor) for 
the treatment or prevention of chronic inflammation will be critical in 
establishing their clinical usefulness beyond current indications. 


P2XR signalling during inflammation 

Molecular structure and signalling cascade of P2XR 

P2XRs are plasma membrane channels selective for monovalent and diva- 
lent cations (Na‘*, K’, Ca’*) which are directly activated by extracellular 
ATP (Fig. 1)®. Seven different subunits have been identified so far (P2X,_7R)®. 
The primary sequence of P2XRs has no important sequence homology 
with other ligand-gated ion channels, ATP-binding proteins or other known 
proteins®. P2XRs share a common topology with two transmembrane 
domains (TM1 and TM2), a large extracellular loop responsible for ligand 
binding, and an intracellular N and a longer C terminus”. The extracel- 
lular ‘loop’ starts in proximity of position 52, and ends near proline 329, 
therefore most of the P2XR protein protrudes from the plasma membrane. 
Evidence suggests that functional P2XRs are trimers, with three peptide sub- 
units arranged around an ion-permeable channel pore, where ATP binding 
promotes subunit rearrangement and ion channel opening®. Three mole- 
cules of ATP seem to bind to the extracellular portions of P2XR. Channel 
opening induces transmembrane ion fluxes, that is, Na* and Ca?" influx 
and K* efflux, leading to plasma membrane depolarization, and—due to 
the increase of intracellular Ca** levels—activation of Ca”~ signalling cas- 
cades, such as p38 MAPK or phospholipase A, activation®’. Interestingly, 
P2X,R is capable of activating the NOD-like-receptor-mediated inflam- 
masome assembly with pro-caspase 1 proteolytic activation and subsequent 
pro-IL-1 and pro-IL-18 cleavage and release of their biologically active forms”. 
The long C-terminal ‘tail’ of P2X,R allows it to undergo a conformational 
change resulting in the so called ‘permeability transition’; that is, P2X7R 
changes from a cationic channel to a wider pore, allowing transmembrane 
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fluxes of small hydrophilic molecules (including ATP) with a molecular 
mass of approximately 900 (refs 69, 70). 

In addition, some P2XRs, for example P2X,R, can form heterotrimers 
with P2X,R, P2X,R and P2X;R subunits, whereas P2X,R forms hetero- 
oligomers with P2X3R. Together these findings indicate that despite the 
fact that all P2XRs are activated by ATP and share significant sequence 
homology with each other, highly distinct functions are unique for indi- 
vidual members of the P2XR family. 


Endogenous and pharmacological ligands for P2XR 

In contrast to P2YR signalling, for which more than one native agonist exists, 
human P2XRs share ATP as their main endogenous agonist’'. Owing to 
the central role of P2X,R during inflammatory disorders, great efforts 
have been made to develop selective antagonists. For example, the P2X7R 
antagonist AZ9056 was used in patients with rheumatoid arthritis. Although 
initial studies were promising, Phase IIb/III studies with different P2X;R 
antagonists failed to improve long-term clinical outcomes”. In addition, 
there are ongoing Phase II studies with P2X,R and P2X3R antagonists in 
chronic pain and chronic cough (http://www.clinicaltrials.gov/)’, high- 
lighting the potential for P2XRs as drugable targets in the treatment of 
inflammatory disorders. 


Functional roles of P2XR in unchallenged mice 

Physiological roles of P2XRs are indicated by phenotypic manifestations 
in mice lacking individual P2XR subtypes. Although all single P2rx knock- 
out mice are viable and survive to adulthood, some of them reveal unex- 
pected phenotypes, suggesting that functional roles of individual receptor 
subunits cannot be compensated for by others. For instance, P2rx1 aa 
mice show reduced vas deferens contraction and are infertile”*, whereas 
P2rx2-‘~ mice develop severe progressive hearing loss’*, and P2rx3 /~ 
mice experience urinary bladder hyporeflexia’’. Mice lacking both P2rx2 
and P2rx3 have enlarged spleens and increased numbers of immune cells”. 
These findings suggest P2XR-subtype-specific signalling functions under 
physiological conditions, including immune functions. 


P2X_R signalling during inflammatory disease 

Compelling evidence implicates P2XR during inflammation and immune 
response against microbes’””*. Although several other P2XRs are func- 
tional during inflammation (for example, P2X,R), P2X,R in particular 
has been shown to affect the outcomes of inflammatory or infectious dis- 
eases. This may be due to the fact that P2X,Rs are predominantly expressed 
on immune cells such as mast cells, macrophages, microglia and DCs (Sup- 
plementary Table 1). Indeed, functional studies implicate P2X,R in immune 
responses against bacterial and parasitic infection. For example, activation 
of P2X,R is involved in the formation of macrophage multinucleated giant 
cells, an important step for the control of tuberculosis (Fig. 3)’”. Other studies 
implicate P2X,R signalling in the elimination of intracellular microbes— 
such as Mycobacterium tuberculosis, Chlamydia psittaci, Leishmania 
amazonensis or Toxoplasma gondii—by either killing of the pathogen or 
by inducing cell death of infected macrophages (Fig. 3)’”*. Human studies 
indicate that loss-of-function mutations in the P2RX7 gene are associated 
with increased susceptibility to tuberculosis or toxoplasmosis”. Owing 
to its proinflammatory role via activation of the inflammasome, and its 
direct cytotoxic or pro-apoptotic function, many reports implicate a role 
for the ATP-P2X-R axis in tumour suppression. P2X,R expression is lower 
in some types of cancer and loss-of-function mutations in the P2RX7 gene 
have been linked to the pathogenesis of chronic lymphatic leukaemia”. 
Additional evidence comes from studies demonstrating that loss-of-function 
mutations of the P2RX7 gene in patients with breast cancer are associated 
with an increased risk of progression to metastatic disease states*®. This 
study specifically implicated DC-P2X,R signalling in resistance to chemo- 
therapy (Fig. 3). DCs present antigens from dying cancer cells to prime 
tumour-specific interferon-y (IFN-y)-producing T cells. Dying tumour 
cells release ATP, which activates P2X,R expressed on DCs, which in turn 
causes inflammasome assembly and subsequent secretion of IL-1 (Fig. 3). 
Accordingly, anticancer chemotherapy was shown to be inefficient against 
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Figure 3 | P2X,R signalling during infection and inflammation. P2X;R 

is required for mounting an appropriate inflammatory response to defend 
against invading pathogens, for example during intracellular killing of 
Mycobacterium tuberculosis by macrophages. Dying tumour cells release ATP, 
which activates P2X,R expressed on DCs, which in turn promotes the priming 
of IFN-y-producing cytotoxic CD8* T cells that kill cancer cells. On the 
other hand, P2X,R signalling on DCs and concomitant T-cell priming 
contributes to allergic disease states, such as CD8° T-cell-elicited contact 
dermatitis. DC-mediated T-cell priming under the control of P2X,R signalling 
has also been shown to promote T};1 responses that are implicated in 
graft-versus-host disease, which contributes to the rejection of a transplanted 
organ. Similarly, P2X;R-mediated T-cell priming towards a T}2 response 
promotes allergic airway disease during asthma. Priming of T}17 cells is critical 
during psoriasis and contributes to intestinal inflammation as occurs during 
IBD. P2X,R signalling on enteric neurons or mast cells has been implicated in 
promoting intestinal inflammation during IBD. 


tumours established in purinergic receptor P2rx7 ’~ hosts*®. Together, 
such findings implicate ATP signalling through the P2X,R in host-defence 
mechanisms against intracellular pathogens and cancers. 

It is particularly interesting that P2X,R signalling on DCs can have very 
different effects on T-cell priming, depending on the specific context, includ- 
ing CD8* and CD4* T-cell differentiation (Fig. 3). In contrast to its essential 
role in immune priming for response against tumours or pathogens, the 
involvement of P2X,R in the polarization of antigen-specific effector T cells 
by DCs contributes to the induction and maintenance of chronic inflam- 
mation. P2X,R signalling on DCs is involved in the sensitization phase of 
allergic disorders such as contact hypersensitivity (CD8* T-cell priming)" 
and asthma (CD4* T-cells, Ty;2 response)*, and contributes to transplant 
rejection (CD4* T cells, Tj;1 response; Fig. 3). For example, recent studies 
implicate P2X,R signalling in graft-versus-host disease, a common com- 
plication following an allogeneic tissue transplant, in which immune cells 
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in the tissue recognize the recipient as ‘foreign’, leading to an immunolo- 
gical reaction of transplanted immune cells against the host. Indeed, P2X,R 
inhibition or deficiency on DCs is associated with reduced severity of graft- 
versus-host disease*. As such, P2X,R signalling of antigen-presenting DCs 
led to an increased expression of CD80 and CD86 in vitro and in vivo and 
activated a cascade of proinflammatory events, including signal trans- 
ducer and activator of transcription 1 (STAT1) phosphorylation, IFN-y 
production and donor T-cell expansion®’. Again, other studies report that 
P2X,R-mediated priming can contribute to T}17-driven autoimmune dis- 
eases, such as psoriasis (Fig. 3)**. By triggering the production of pro- 
allergic mediators from eosinophils, mast cells, macrophages and DCs, 
P2X,R signalling also contributes to the effector phase and chronification 
of allergic disorders***’. Furthermore, increased expression of P2X,;R can 
be found on eosinophils and macrophages in asthma (Fig. 3)*', and loss- 
of-function mutations in the P2rx7 gene have been associated with atte- 
nuated risk of allergen sensitization and asthma™. 

In line with these findings, other studies report a detrimental role of 
P2X,R in promoting excessive inflammation during IBD, by showing that 
ATP derived from commensal bacteria activates a unique subset of lamina 
propria cells (CD70"8' CD11c! cells), leading to Ty17 cell differentia- 
tion (Fig. 3)*°. Indeed, germ-free mice exhibit lower concentrations of 
luminal ATP, accompanied by fewer lamina propria T};17 cells**. P2X7R 
also participates in IBD pathogenesis by mediating enteric neural death 
(Fig. 3)****. Finally, mast-cell-dependent mechanisms of intestinal inflam- 
mation are under the control of P2X,R, as increased P2X,R expression 
can be found in mast cells from Crohn’s patients and inhibition of P2X;R 
on mast cells dampens intestinal inflammation (Fig. 3)*’. 

Together, these findings expose P2X,R signalling during inflamma- 
tion as a double-edged sword: these receptors have a critical role in me- 
diating appropriate inflammatory and immunological responses against 
invading pathogens or cancer cells, respectively, but contribute to chronic 
inflammatory disease states in a wide range of inflammatory disorders, 
such as chronic lung disease*®**, asthma*"** or IBD***’, when activated 
inappropriately. 


Termination of ATP signalling 


Termination of P2R signalling involves the conversion of ATP/ADP to 
adenosine within the extracellular compartment by the activity of ecto- 
nucleotidases. The four main groups of ectonucleotidases are the ectonucle- 
oside triphosphate diphosphohydrolases (NTPDases), ecto-5’-nucleotidase 
(CD73), ectonucleotide pyrophosphatase/phosphodiesterases and alka- 
line phosphatases. NTPDases represent a family of ubiquitously expressed 
membrane-bound enzymes. The catalytic sites of plasma membrane- 
expressed NTPDases 1-3 and 8 are oriented towards the extracellular 
milieu”’. Owing to its high expression in many tissues and its ability to 
catalyse the conversion of ATP (and ADP) down to AMP, many studies 
have found a functional role for NTPDasel (CD39) in the termination of 
P2R signalling’. Next, extracellular AMP is converted to adenosine by 
CD73 (Fig. 1)°. Therefore, termination of ATP signalling is closely linked 
to the generation of extracellular adenosine. In many instances, adenosine- 
elicited P1R signalling dampens acute inflammation and tissue injury”””’, 
thus opposing inflammatory functions of P2Rs (Fig. 1)”*”. 

Consistent with a protective role for the CD39/CD73 pathway in ter- 
minating inflammatory P2R signalling, and concomitantly increasing extra- 
cellular adenosine levels and signalling events, several studies show that 
Cd39’~ or Cd73‘~ mice are prone to tissue injury during inflammatory 
conditions such as acute lung injury or intestinal inflammation”. For 
example, patients with a single nucleotide polymorphism associated with 
low levels of CD39 expression have increased susceptibility to Crohn’s 
disease, suggesting that deficiency in CD39 could be associated with IBD 
in humans”. Other reports suggest that CD39 exerts a protective throm- 
boregulatory function in stroke by preventing P2R-mediated thrombosis”. 
Moreover, several studies implicate the CD39/CD73 pathway in the immu- 
nosuppressive roles of regulatory T cells (Teg). These are a group of cD4* 
lymphocytes that suppress T-cell responses against a variety of pathogens 
and control inappropriate immune activation, thus limiting collateral tissue 
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damage but allowing pathogen persistence’. As such, Tyeg cells from 
cd39 ’~ mice demonstrate attenuated suppressive functions in vitro 
and fail to block rejection of allografts in vivo”. Similarly, Cd73"‘~ mice 
fail to resolve lung injury induced by lipopolysaccharide inhalation due 
to impairment of T,., functions’. 

The ambivalence of CD39/CD73-mediated control of Tyg, is further 
exemplified during infections with human immunodeficiency virus (HIV), 
the retrovirus known to cause AIDS in humans. HIV infections are char- 
acterized by a progressive CD4 lymphopenia in conjunction with defec- 
tive HIV-specific CD8 responses that are critical for the control of viral 
replication”’. As such, the consequences of T,eg expansion, as seen during 
HIV infection, could have either a beneficial function by suppressing gener- 
alized T-cell activation, or could be fatal owing to attenuated HIV-specific 
responses and thus promoting viral persistence”. For example, studies dem- 
onstrate that HIV-1-positive patients have an increase of T,eg-associated 
expression of CD39. These findings indicate that the CD39/CD73 path- 
way is involved in T,.. suppression in HIV infection’®’. A genetic asso- 
ciation study demonstrated that a polymorphism in the CD39 gene is 
associated with attenuated CD39 expression and slower progression to 
AIDS in HIV-infected patients'*". Thus, it can be speculated that CD39* 
Tyegs are the most potent T,.. subset to inhibit HIV-specific T-cell responses. 
This could at least in part account for their association with disease pro- 
gression. Other examples for a detrimental role of CD39-dependent ATP 
breakdown come from studies of autoimmune hepatitis in which natural 
killer T cell dysfunction in Cd39-’~ mice protects against concanavalin 
A-induced hepatitis. Heightened levels of apoptosis of Cd39"’~ natural 
killer T cells in vivo and in vitro appear to be driven by unimpeded acti- 
vation of P2X,R'™. Similarly, enzymatic removal of ATP by apyrase (conver- 
sion of ATP/ADP to AMP) or ectopic CD39 expression attenuates clearance 
of apoptotic cells, indicating a detrimental role for CD39-dependent ATP 
phosphohydrolysis in dampening efficient corpse clearance via P2Y,R 
signalling”. Other studies demonstrate the existence of bacterial ectotri- 
phosphate diphosphohydrolases—similar to human CD39—which are 
critical for the intracellular multiplication of Legionella pneumophila by 
preventing P2R-elicited immune responses. As such, these findings implicate 
bacterial ectotriphosphate diphosphohydrolases in virulence’”’. Together 
these studies exemplify an ambivalent role for the termination of P2R 
signalling via enzymatic phosphohydrolysis. Although this pathway is 
critical in preventing excessive P2R-dependent inflammation in a sterile 
environment’™, CD39 function can become detrimental for the appro- 
priate clearance of apoptotic debris”, inflammation directed against bac- 
terial infections'*’ or by generating an immunosuppressive environment, 
which promotes the development or progression of cancer®. 


Functional role of P1 signalling during inflammation 


Extracellular AMP generated by phosphohydrolysis of precursor nucle- 
otides (for example, ATP or ADP) has no clearly characterized signalling 
function (for example, through specific AMP receptors). However, extra- 
cellular AMP serves as the metabolic substrate for the extracellular genera- 
tion of adenosine via CD73 (Fig. 1)°. Once generated within the extracellular 
compartment, adenosine can function via activation of four distinct P1 
receptors: ADORA1, ADORA2A, ADORA2B or ADORA3. Adenosine 
signalling is terminated via uptake of adenosine from the extracellular 
towards the intracellular compartment through equilibrative nucleoside 
transporters and is metabolized to inosine via adenosine deaminase’, 
or to AMP via adenosine kinase’. Several studies implicate adenosine 
signalling in dampening excessive inflammation”. For example, Adora2a /~ 
mice experience increased inflammation including extensive tissue damage, 
more prolonged and higher levels of proinflammatory cytokines, and mor- 
tality when exposed to sub-threshold doses of inflammatory stimuli”. Other 
studies demonstrate that ADORA2B signalling dampens excessive inflam- 
mation during acute lung injury’, promotes ischaemia tolerance and 
improves anaerobic carbohydrate metabolism’. Similarly, genetic dele- 
tion or pharmacological blockade of equilibrative nucleoside transporters is 
associated with increased adenosine levels and improved outcomes during 
inflammatory disease states''®''. In most instances, the anti-inflammatory 
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signalling effects of adenosine are associated with improved outcomes 
during inflammatory diseases such as IBD'”, or during sepsis induced by 
caecal ligation and puncture’!’. However, other studies indicate that the 
anti-inflammatory effects of adenosine signalling can be detrimental in 
containing an infection with live bacteria. For example, a recent study dem- 
onstrates that antagonism of P1 receptors (for example, ADORA2B) can 
be useful in enhancing macrophage-mediated bacterial phagocytosis and 
improving polymicrobial sepsis survival in mice’*. Together, these studies 
highlight that P1 and P2 receptors frequently have opposing effects in 
biological systems, and that shifting the balance from purinergic P2YR 
and P2XR signalling towards adenosine-mediated P1 signalling is an impor- 
tant therapeutic concept in efforts to dampen pathological inflammation 
and promote healing’. 


Conclusions 


The field of extracellular nucleotide signalling and metabolism is a dynamic 
area of research with important opportunities for novel treatments for 
inflammatory or infectious diseases. On the one hand, P2R signalling 
functions to coordinate appropriate immune responses against invading 
pathogens or tumours. Indeed, pharmacological approaches that amplify 
extracellular ATP signalling hold promise as therapies for the treatment 
of cancer or during uncontrolled infections with live pathogens. Such strat- 
egies could include inhibition of ATP breakdown (for example, via nucle- 
otidase inhibitors) or treatment with P2 receptor agonists. Conversely, 
inadequate P2R signalling has been associated with excessive inflamma- 
tion, chronification and inappropriate resolution and fibrosis in a wide 
range of inflammatory diseases. In this context, treatment strategies that 
block P2R signalling, promote extracellular conversion of ATP to aden- 
osine and activate adenosine receptors have been implicated in the treat- 
ment of acute or chronic inflammatory diseases. We therefore anticipate 
that compounds targeting these pathways will be further exploited in the 
treatment of inflammatory conditions in human patients in the near future. 

Note added in proof: Two reports appeared online regarding the atomic 
structure of the P2Y,,R while the current review was in press. The first 
report provides a 2.6 A resolution crystal structure of the human P2Y,,R 
in complex with the non-nucleotide antagonist AZD 1283 (ref. 115), thus 
providing important insights for the development of P2Y,2R ligands and 
allosteric modulators as drug candidates. The second report provides 
the structures of the human P2Y,,R in complex with a full agonist (2- 
methylthio-adenosine-5’-diphosphate) at a resolution of 2.5 A, and the cor- 
responding ATP derivative 2-methylthio-adenosine-5'-triphosphate at 
3.1 A resolution"®. The agonist-bound P2Y12R structure answers ambi- 
guities surrounding P2Y 12R-agonist recognition, and suggests unexpected 
interactions with several residues. 
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Many natural products that contain basic nitrogen atoms—for example alkaloids like morphine and quinine—have the 
potential to treat a broad range of human diseases. However, the presence of a nitrogen atom in a target molecule can 
complicate its chemical synthesis because of the basicity of nitrogen atoms and their susceptibility to oxidation. Obtain- 
ing such compounds by chemical synthesis can be further complicated by the presence of multiple nitrogen atoms, but it 
can be done by the selective introduction and removal of functional groups that mitigate basicity. Here we use such a 
strategy to complete the chemical syntheses of citrinalin B and cyclopiamine B. The chemical connections that have been 
realized as a result of these syntheses, in addition to the isolation of both 17-hydroxycitrinalin B and citrinalin C (which 
contains a bicyclo|2.2.2|diazaoctane structural unit) through carbon-13 feeding studies, support the existence of acommon 
bicyclo[2.2.2]diazaoctane-containing biogenetic precursor to these compounds, as has been proposed previously. 


The prenylated indole alkaloids are an emerging class of natural pro- 
ducts typified by the presence of an indole ring, or derivatives thereof 
(that is, spirooxindole or pseudoindoxyl), decorated by one or more 
prenyl groups or the vestige of a prenyl group. Isolates from this family 
of natural products include citrinalins A and B (Fig. 1, 1 and 2) and cyclo- 
piamines A and B (4 and 6), which are the focus of this Article. The 
modifications of the indole core in the prenylated indole alkaloid family, 
which occur by a reaction with dimethylallyl pyrophosphate’, results 
in the introduction of a chromene unit as is found in (+)stephacidin A 
(10; see blue highlighted portion) or a bicyclo[2.2.2]diazaoctane core 
that is typical of many congeners, including 11 and 12 (ref. 2) (see red 
highlighted portion). 

Although structurally similar, the prenylated indole alkaloids display 
a diverse range of bioactivities including anti-tumour, insecticidal, anthel- 
mintic, calmodulin-inhibition and antibacterial properties’. The recent 
discovery of citrinadins A (ref. 4) and B (ref. 5) (7 and 8) and PF1270A- 
PF1270C° (9a-9c) has added an unprecedented dimension to the struc- 
tural motifs afforded by the Penicillium strains, and has raised several 
questions as to the biogenesis of these structurally related alkaloids. 
Recently, syntheses of citrinadins A and B have been achieved”*. Par- 
ticularly intriguing to us is a subset of this emerging subclass including 
citrinalins A and B (1 and 2) and cyclopiamines A and B (4 and 6), which, 
like the citrinadins, lack the bicyclo[2.2.2]diazaoctane framework and, 
remarkably, possess an alkyl nitro group. Cyclopiamines A and B (4 
and 6) were discovered in 1979 in a toxinogenic strain of Penicillium 
cyclopium”, whereas citrinalins A and B (1 and 2) were discovered in 
2010 in a strain of Penicillium citrinum’®. Although natural products 
that possess aryl nitro groups are known, those that contain aliphatic 
nitro groups are extremely rare'’. Asa result, the citrinalins and cyclo- 
piamines, which also possess three nitrogen atoms in chemically dis- 
tinct environments, are unusual and are therefore attractive targets for 
synthesis. The synthetic studies described here have culminated in the 
total syntheses of ent-citrinalin B (ent-2; ent, enantiomer) and cyclo- 
piamine B (6), and, along with BC feeding studies that have resulted in 


the isolation of two new citrinalins, provide support for a proposed 
biogenesis of the subset of prenylated indole alkaloids that lack the 
bicyclo[2.2.2]diazaoctane core. 


Biosynthetic connections 


A stimulating connection may be drawn between cyclopiamine A and 
B via the intermediacy of nitronate iminium ion 5 (ref. 9) (Fig. 1). The 
interconversion of 4 and 6 was demonstrated by heating either com- 
pound in a mixture of dioxane and water or in dimethylformamide? 
(DMF). This led to a proposal that 6, which is the more stable of the 
two isomers (we have computed 6 to be 9.6 kcal mol” * lower in energy 
than 4 in a DMF solvent model; see Supplementary Information), may 
in fact be an isolation artefact. Given the likelihood that the citrinadins, 
citrinalins and cyclopiamines are all oxidative degradation products of 
a precursor containing a bicyclo[2.2.2]diazaoctane ring, such as marc- 
fortine A (11; in the case of the citrinadins) or stephacidin A (10; in the 
case of the citrinalins and cyclopiamines), we wondered whether the 
citrinalins could be transformed to the cyclopiamines. On the basis of 
this assumption, it is particularly baffling that, unlike cyclopiamines A 
and B, which are related by an aza-Henry (or nitro-Mannich) reaction 
as shown in Fig. 1 (4 <> 6, via 5), citrinalin A and the originally pro- 
posed structure of citrinalin B (3) would be related not by the formal 
epimerization of the C22 stereocentre but rather by the nature of the 
relative configuration of the C14 carbon (highlighted in 2 and 3). On 
the basis of the connection between cyclopiamines A and B demonstrated 
previously’, we intuited that the structure of citrinalin B may be better 
represented by 2. To support this proposal, we undertook a computa- 
tional simulation ofthe 'H and ‘°C NMR spectra that would be expected 
for the neutral and salt forms of citrinalins A and B (Supplementary Infor- 
mation). As has been convincingly demonstrated in numerous cases, 
this method provides an accurate prediction of the structures of com- 
plex natural products'*. We found that the computed and empirical data 
for the trifluoroacetic acid salt form of citrinalin A is in good agreement 
with those reported in ref. 10. The corrected mean absolute deviations 
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Figure 1 | Selected prenylated indole alkaloids. The prenylated indole 
alkaloid family encompasses over 80 natural products, some of which contain a 
bicyclo[2.2.2]diazaoctane core as in 10, 11 and 12. Recently, several members of 


(CMAD) in the 'H and ‘°C NMR resonances are 0.21 and 2.0 p.p.m., 
respectively (the largest outliers are 1.0 and 5.2 p.p.m., respectively). How- 
ever, the computed data for the trifluoroacetic acid salt form of 3 (the 
originally proposed structure of citrinalin B) differs significantly from 
that recorded using the naturally occurring material (CMADs, 0.45 and 
2.0 p.p.m.; largest outliers, 2.3 and 9.6 p.p.m. for 'H and '¥C, respec- 
tively). The best match to the reported spectral data was found to cor- 
respond to 2 in its neutral form (CMADs, 0.12 and 1.6 p.p.m,; largest 
outliers, 0.38 and 4.4 p.p.m. for 1H and ¥C, respectively), which cor- 
roborates the potentially similar biosynthetic connection that has been 
established for the cyclopiamines (outlined in Fig. 1). As a result, we chose 


Marcfortine A (11) 


Paraherquamide A (12) 


this family (for example 1 and 4) have emerged that do not possess this 
structural motif. Me, methyl. 


to proceed on the assumption that 2 most probably represents the cor- 
rect structure of citrinalin B. Ultimately, a reanalysis of the NMR data 
of citrinalin B, collected in MeOH-d, (Supplementary Information), 
corroborates the assignment of 2 as the true structure of citrinalin B. 


Synthesis 

As outlined in Fig. 2, cyclopiamine B (6) can be obtained from the enan- 
tiomer of citrinalin B (ent-2) by using a chromanone rearrangement 
to forge the tetrahydroquinolone structural moiety found in the cyc- 
lopiamines. In turn, ent-2 could be taken back using an ‘indole-to- 
spirooxindole’ transform to fused hexacycle 13. Fused indole 13 would 


Ent-citrinalin B (ent-2) 13 


<p xP 


p-proline (17) 


Figure 2 | Retrosynthetic analysis plan for cyclopiamine B and citrinalin B. The syntheses of natural products 2 and 6 are expected to arise from common 


intermediate 13. TIPS, triisopropylsilyl. 
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arise from tricycle 14, which may be prepared from diene 15, the f- 
butyldimethylsilyl variant of which was first prepared in ref. 13, and 
tetrahydroindolizinone 16, which would ultimately arise from D-proline 
(17). 

We initiated our synthetic studies with the protection of D-proline 
by t-butoxycarbonyl (Fig. 3), which was followed by the reduction of 
the carboxylic acid group and Swern oxidation of the resulting hydroxyl 
to afford aldehyde 18 (ref. 14). Alkynylative homologation of the aldehyde 
group of 18 using the Ohira-Bestmann method”, followed by removal 
of the t-butoxycarbonyl group and acylation with 2-cyanoacetyl chlo- 
ride gives alkyne 19. This serves as a substrate for an unprecedented 
formal cycloisomerization that probably proceeds via a metal vinyli- 
dene intermediate’®, anti-Markovnikov hydration and Knoevenagel 
condensation to give tetrahydroindolizinone 16. At this stage, a SnCl,- 
catalysed Diels-Alder [4+ 2] reaction’” between 16 and diene 15, anda 
subsequent basic work-up, affords an enone (not shown), which is iodin- 
ated to yield iodoenone 20 (ref. 18). A mild hydrolysis of the nitrile group 
of 20 is achieved using Pt-complex 21 (ref. 19) to afford the corresponding 
carboxamide, which serves as a substrate for a Hofmann rearrangement 
that is effected with phenyliodosyl bistrifluoroacetate to yield carbamate 
22 (ref. 20). Suzuki cross-coupling of 22 with known boronic ester 23 
(ref. 21) gives adduct 24, which is efficiently converted to fused indole 
25 using two sequential reductions—all in accord with the effective pro- 
tocols established in ref. 22. 

The face-selective oxygenation of C2/C3-fused indoles is a well- 
established route to hydroxyindolenines, which serve as precursors to 
the corresponding spirooxindoles”’. We therefore reasoned that the 
oxygenation of indole 25 (Fig. 4a) could be a path to the spirooxindole 
structural moiety found in 1, 2, 4 and 6. On the basis of related prece- 
dents for heteroatom-directed oxygenation”**”’, we expected the car- 
bamate group of 25 to direct oxygenation to the o-face and provide 
28. Surprisingly, the use of Davis’ oxaziridine”® (29; 3.0 equiv.) leads to 
26 and trace amounts of both hydroxyindolenine 28 and spirooxin- 
dole 27 (spirooxindole 27 arises via the intermediacy of hydroxyindo- 
lenine 26). A survey of other oxaziridines, including 30 and 31, leads, 
at best (using 31), toa 1:1 ratio of the desired hydroxyindolenine, 28, and 
both hydroxyindolenine 26 and spirooxindole 27. Because the inherent 


(1)-(3) 


face selectivity for the oxygenation of 25 is poor, attention was turned 
to the use of reagent control to achieve the desired diastereoselective 
oxygenation. In this regard, we were drawn to peptide-derived catalysts 
developed for oxygenations”. Following an investigation of a focused 
library of peptide catalysts, 32 (Fig. 4b) emerged as the superior catalyst 
(20 mol% loading) and provided hydroxyindolenine 28 in 83% yield 
from 25. Hydroxyindolenine 28 rearranges with heating using Sc(OTf); 
over 2 h to afford pseudoindoxyl 33 (Fig. 4c) instead of the desired spi- 
rooxindole. The equilibrium between pseudoindoxyls and spirooxin- 
doles is well recognized and has been studied for the migration of C2 
alkyl substituents*® and C2 aryl substituents””. However, despite pro- 
longed heating, further rearrangement of pseudoindoxyl 33 to the desired 
spirooxindole was not observed. It is possible that an intramolecular 
hydrogen bond stabilizes pseudoindoxy! 33 against further rearrange- 
ment (a bond distance of 2.24 A is computed for the pseudoindoxyl 
carbonyl group and N-H proton of the carbamate group in 33; see Sup- 
plementary Information). Evidence for a stabilizing intramolecular hydro- 
gen bond in 33 comes from the observation that hydroxyindolenine 26 
(prepared by oxidation of 25 with Davis’ oxaziridine) rearranges readily 
at room temperature in the presence of mild acid to spirooxindole 27; 
a pseudoindoxyl generated from 26 would lack the analogous stabiliz- 
ing hydrogen bond. However, the possibility exists that 26 proceeds to 
an epoxide intermediate (see A in inset in Fig. 4c) that rearranges to 27. 
The difficulty of further rearranging pseudoindoxyl 33 caused us to con- 
sider alternative approaches that would produce the desired spiroox- 
indole structural moiety of the citrinalins and cyclopiamines. 

Amino compound 35 (Fig. 5) was prepared on the assumption that 
an amino group, or some oxidized derivative thereof (for example the 
corresponding hydroxylamine), could serve as a hydrogen-bond donor 
to effect stereoselective oxygenation of the indole C2-C3 bond and then, 
by further oxidation to a nitroso or nitro group, remove the presumed 
intramolecular hydrogen bond that may stabilize the pseudoindoxyl 
form (as in 33). It seemed reasonable that this sequence would facili- 
tate the eventual conversion of 35 to nitro spirooxindole compound 
36. Initial experiments established that epoxidation of the chromene 
ring was a competing reaction that occurred under various oxygena- 
tion conditions. As such, we opted to effect a Wacker oxidation” of 25 


(7) Cat. [Ru(Cp)(MeCN)L, 
98% 


JPF, 


pe. re -8- _ 
N > N 
Be. 65% OD PPh, | 
(3 steps) (3 steps) Au L= | Shy H 
p-proline (17) 18 19 Bei 16 
Me 
nde 
bot \~Me; Me 
OTN Pt IP 
Me, “OH 
21 
(8), (9) (10), (11) 
68% ioe 
(2 steps) (2 steps) 
(12) (13), (14) 
$< SS} 
95% 82% 
(2 steps) 


Figure 3 | Preparation of fused hexacycle 25. The use of a Diels—Alder 
reaction involving a proline-derived indolizidinone dienophile affords a key 
tricycle that is advanced to hexacycle 25 by Suzuki coupling to boronic ester 
23. Reagents and conditions are as follows: (1) di-t-butyl dicarbonate (Boc,O), 
NaHCO, H,O and tetrahydrofuran (THF), room temperature (RT = 23 °C); 
(2) BH3*THE, THE, 0 °C to RT; (3) (COC1)2, dimethylsulphoxide, CHCl, 
diisopropylethylamine, —78 °C; (4) dimethyl (diazomethyl)phosphonate, 
K,CO3, MeOH, 0 °C to RT; (5) 4N HCl and dioxane, 0 °C to RT; 
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(6) 2-cyanoacetylchloride, Et;N, CHCl), 0 °C to RT; (7) acetonitrile 
bis[2-diphenylphosphino-6-t-butylpyridine] cyclopentadienylrutheniumi1) 
hexafluorophosphate (8 mol%), acetone and H,0, 70 °C; (8) 15, SnCly, — 78 to 
—42 °C; (9) I, 4-dimethylaminopyridine, pyridine and CCl,, 60 °C; (10) 

21 (20 mol%), EtOH and H,0, RT; (11) phenyliodosyl bistrifluoroacetate, 
MeOH, RT; (12) dppfPdCl, (10 mol%), K;PO4, DMF, 40 °C; (13) Zn dust, 
NH,Cl, HCO,NH,, p-TsOH, MeOH, RT; (14) NaCNBH3, 1N HCl(aq.), 0 °C to 
RT. dppf, (diphenylphosphino)ferrocene; Et, ethyl; t-Bu, t-Butyl; Ts, tosyl. 
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29 (3.0 equiv.) 30 (3.0 equiv.) 31 (3.0 equiv.) 
61% yield 26 (trace 28 and 27) 1:4 28:(27+26) 1:1 28:(27+26) 
b 
(2) 
————S 
83% yield 


73% yield 


84% yield 


H bond? 
(2.24 A apart) 


Indole oxidation catalyst (32) 


Figure 4 | Face-selective oxygenation of fused hexacycle 25. a, Oxidative 
rearrangement studies of fused indole 25 with a range of oxaziridines leads 
predominantly to the undesired, epimeric, hydroxyindolenine (26) and 
spirooxindole (27). b, Use of indole oxidation peptide catalyst 32 to effect 
oxidation yields the desired hydroxyindolenine (28). c, Hydroxyindolenine 
28 rearranges to an undesired pseudoindoxyl (33), whereas the epimeric 


to afford chromanone 34 (Fig. 5), which would be advantageous because 
the chromanone unit is found in the citrinalins and cyclopiamines. Remark- 
ably, treatment of 35 (following removal of the methoxycarbonyl group 
in 34) with an excess of dimethyldioxirane (formed in situ from acet- 
one and Oxone) affords spirooxindole 36 as the major product (dia- 
stereomeric ratio, 4:1) where the spiro centre is as desired and the nitro 
group has been installed. Studies of dimethyldioxirane oxidations of 
indoles to spirooxindoles” suggest that spirooxindole 36 might arise from 
epoxide B (Fig. 5, inset). Therefore, it is possible that the introduction of 
the chromanone diminishes the participatory role of the indole nitro- 
gen lone pair leading, after rearrangement (see direction of arrowin B), 
to 36. With spirooxindole 36 in hand, what remained was a selective 
removal of the tertiary amide carbonyl group by reduction, which had 
to be accomplished in the presence of the chromanone and secondary 
amide carbonyl groups as well as the newly introduced nitro group. After 


hydroxyindolenine (26) affords the corresponding spirooxindole (27). 
Reagents and conditions are as follows: (1) oxaziridine (29, 30, or 31), CH2Ch, 
RT; (2) 32 (20 mol%), 4-dimethylaminopyridine, diisopropylcarbodiimide, 
H,0,, CHC1,, 4 °C; (3) Sc(OTf)3, toluene, 110 °C; (4) 23 mM HCl, CHCl, RT. 
Bn, benzyl; Cbz, carboxybenzyl; Ph, phenyl. 


extensive investigation, this task was effectively accomplished using a 
modification of a known procedure” by treating 36 with a variant of 
Meerwein’s salt (Me3O0BF,), which probably leads to a methylated ami- 
dinium intermediate that is cleanly reduced with sodium cyanoborohy- 
dride to give ent-citrinalin B (ent-2) in 66% yield (79% based on recovered 
starting material). The spectroscopic data for the neutral form of ent-2 
are fully consistent with previous data reported for the compound believed 
to be citrinalin B (ref. 10; corroborating the computational predictions 
and reanalysis in MeOH-d,), except for the sign of optical rotation, which 
is opposite. The structure of ent-2 was unambiguously confirmed by 
X-ray crystallographic analysis of its HCl salt. Ent-citrinalin B is easily 
converted to cyclopiamine B (6) on treatment of ent-2 with sodium hydride 
and heating (to effect the conversion of chromanone to tetrahydroqui- 
nolone) and subsequent methylation of the resulting phenol. The struc- 
ture of cyclopiamine B (6) was also unambiguously confirmed by X-ray 
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Figure 5 | Completion of the 
syntheses of ent-citrinalin B and 
cyclopiamine B. The total syntheses 
of 2 and 6 required the identification 
of conditions that accomplished the 
oxidation of the amino group and 
spirooxindole formation in one pot 
as well as unique conditions for the 
selective reduction of the tertiary 
amide carbonyl group. The 
rearrangement of ent-citrinalin B 

(2) to cyclopiamine B (6) was also 
demonstrated. Reagents and 
conditions are as follows: (1) 
Pd(OAc), (40 mol%), benzoquinone, 


(4), (8) 
—<—$—— 
66% yield 

(79% b.r.s.m.) 
(2 steps) 


Ent-Citrinalin B (ent-2) 


Ent-citrinalin BeHCl (ent-2eHCl) 


H,SO,4, MeCN and H,0, RT; (2) 
Me,S, methanesulphonic acid, 40 °C; 
(3) Oxone (10 equiv.), NaHCO;, 
acetone and HO, 0°C to RT; 

(4) MezOBFy, CH2Ch, 4 A molecular 
sieves, 45 °C; (5) NaCNBH3, MeOH, 
0 °C; (6) NaH, DMF, 60 °C; (7) Mel, 
K,CO3, acetone, 60 °C. b.r.s.m., 


(6), (7) 


> 
78% yield 
(2 steps) 


crystallographic analysis. Thus, the synthesis of ent-2 and its conver- 
sion to 6 show that ent-2 is the true structure of citrinalin B, albeit the 
enantiomer of the naturally occurring material. 


Biosynthetic considerations 
The total syntheses of ent-citrinalin B (ent-2; 19 steps from D-proline, 
5.5% overall yield) and cyclopiamine B (6; 21 steps from D-proline, 4.3% 
overall yield) not only unambiguously establish the structures of these 
metabolites, but also provide possible insight into the biogenesis of these 
natural products (especially as to the possible formation of the cyclo- 
piamines from the citrinalins). 

The citrinalins, and in turn the cyclopiamines, probably arise from 
a bicyclo[2.2.2]diazaoctane precursor. However, such a precursor was 
unknown before the findings that are reported herein (see below). Con- 
sistent with numerous biosynthetic studies of the prenylated indole alka- 
loids, the structural features of 1, 2, 4 and 6 suggest that tryptophan, 
proline and two isoprene units are biosynthetic precursors to these com- 
pounds. Although no biosynthetic studies on 1 and 2 or 4 and 6 or the 
related citrinadins and PF1270 alkaloids has appeared, it has been sug- 
gested that they are derived from bicyclo[2.2.2]diazaoctane precursors 
that suffer the ‘loss’ of one diketopiperazine carbonyl group’. Through 
the isolation of 17-hydroxycitrinalin B (37; Fig. 6a) and, more impor- 
tantly, citrinalin C (38) following a series of stable isotope labelling exper- 
iments (summarized in Fig. 6b; see Supplementary Information), we have 
now obtained support for the possible biogenesis of the citrinalins and 
cyclopiamines from a precursor bearing the bicyclo[2.2.2]diazaoctane 
moiety. 

The NMRand mass spectroscopy characterization data for 37 is fully 
consistent with the assigned structure. Moreover, the assigned relative 
configuration fully corroborates the revised structure of citrinalin B (2). 
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based on recovered starting material; 
d.r., diastereomeric ratio; Oxone, 
potassium peroxymonosulphate. 


O26 oH 
17-hydroxycitrinalin B (37) Citrinalin C (38) 


b 


2 fe) OH [U-*8C]anthranilic acid 
a — 
HO 
[1-18C]glucose Aa 


HN 


H2N = 
GOH 


WA [U-*8C]ornithine 


Citrinalin A (1) (R = H, o-NO,) 
Citrinalin B (2) (R = H, B-NO,) 
17-hydroxycitrinalin B (37) (R = OH, B-NO,) 


Figure 6 | Isolation of two new citrinalins and '°C labelling studies. 

a, Structures of 17-hydroxycitrinalin B and citrinalin C. Two additional 
citrinalins, 37 and 38, were isolated on refractionation and reanalysis of 
secondary metabolites from P. citrinum F53. b, Summary of the '*C labelling 
studies. '*C incorporation studies of P. citrinum F53 reveal that glucose (pink), 
anthranilic acid (blue) and ornithine (red) are biosynthetic precursors to 

the citrinalins. 


©2014 Macmillan Publishers Limited. All rights reserved 


+H,0; -CO, 
d NEI, Nye oe ae 
“ORE then [0] 


Citrinalin C (38) 


Figure 7 | Biosynthetic proposal for citrinalins. Consistent with previous 
reports on the bicyclo[2.2.2]diazaoctane congeners, the citrinalins probably 
arise through an intramolecular Diels—Alder reaction to form citrinalin C 


By analogy to citrinalin B (2), the absolute configuration of 37 was assigned 
as 1S,14R,16R,17R,22R. 17-hydroxycitrinalin B (37) was initially iso- 
lated from P. citrinum F53 grown in a nitrogen-depleted culture med- 
ium. Stable isotope feeding studies with [U-'°C]anthranilic acid and 
[1-'*C] glucose gave significant '*C labelling (Supplementary Informa- 
tion). High levels of [U-°C]ornithine were also incorporated into 37, 
and additional feeding studies with [U-'*C]proline gave almost unde- 
tectable labelling. Ornithine is a well-known biosynthetic precursor to 
proline, but to our knowledge it has never been reported as an efficient 
substrate for isotopic labelling of the putative proline-derived atoms in 
the biosynthesis of prenylated indole alkaloids of fungal origin bearing 
the bicyclo[2.2.2]diazaoctane moiety. The labelling investigations sug- 
gest that 17-hydroxycitrinalin B (37) might arise from either 3-hydroxyl 
ornithine, 3-hydroxy proline or by the late-stage oxygenation of the 
citrinalin A, B or C skeleton. 

Citrinalin C (38), isolated as a minor component from the culture 
medium of P. citrinum F53, gives NMR and mass spectroscopic data 
(Supplementary Table 4) that is fully consistent with the relative and 
absolute configuration illustrated for this natural product. The isolation 
of 38, along with the congeners lacking the bicyclo[2.2.2]diazaoctane 
structural moiety from P. citrinum F53, lends support to a bicyclo[2.2.2] 
diazaoctane-containing precursor, which arises from a committed intra- 
molecular Diels-Alder cycloaddition step such as that studied in detail 
for other congeners’. Hydrolysis of the amide bridge of citrinalin C 
(38; Fig. 7), followed by decarboxylation, and amino-group oxidation 
to the nitro group, as proposed in the biosynthesis of the structurally 
related citrinadin B°, would then yield citrinalin A. These latter steps 
are the subject of current biosynthesis studies. 

A question that remained at this stage concerned the biogenesis of 
citrinalin B. On the basis of observations of the cyclopiamine series’ 
(see 4— 6 in Fig. 1), we anticipated that citrinalin A (1) might be con- 
verted to citrinalin B (2) via a nitronate iminium intermediate analogous 
to 5. In the event, heating a solution of a naturally occurring sample of 
citrinalin A (1) in DMF-d, at 100 °C for 20h leads to a 1:1 ratio of 1 
and 2 (with complete conversion to citrinalin B (2) after 60 h; see Sup- 
plementary Fig. 22), confirming the connection of these metabolites 
presumably by the same aza-Henry or nitro-Mannich epimerization 
sequence established for the cyclopiamines’. However, we have observed 
some key differences. First, the epimerization in the citrinalin series 
occurs at a qualitatively lower rate (probably owing to a non-productive 
proton transfer from the vinylogous imide N-H to the tertiary amine) 
and higher temperature. In addition, we have not been able to achieve 
any observable conversion of citrinalin B to citrinalin A even at elevated 
temperatures (165 °C) over prolonged periods (24h). Our current efforts 
are focused on gaining a deeper understanding of these differences and 
exploring the biosynthetic conversion of citrinalin C to citrinalin A. 


Conclusion 


We have reported the total syntheses of the prenylated indole alkaloids 
ent-citrinalin B and cyclopiamine B. Our results unambiguously identify 
citrinalin B through synthesis, a reanalysis of the naturally isolated mate- 
rial and an X-ray crystallographic study. Our studies on the isolation of 
metabolites from P. citrinum suggest that a bicyclo[2.2.2]diazaoctane- 
containing metabolite such as citrinalin C (38) is an intermediate in the 
biogenesis of citrinalins A (1) and B (2) (Fig. 7). The extension of the 
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ARTICLE 
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Citrinalin B (2) 


(38), which is followed by a decarboxylation event and amine-group oxidation 
to the nitro group. 


synthetic methods reported here to the syntheses of other prenylated 
indole alkaloids is ongoing and will be reported in due course. 


METHODS SUMMARY 


All reactions were performed under a nitrogen atmosphere using dry solvents 
under anhydrous conditions, unless otherwise noted. Dry tetrahydrofuran, tolu- 
ene, methanol, triethylamine, benzene and diethyl ether were obtained by passing 
the commercially available, oxygen-free solvents through activated alumina columns 
from GlassContour. Dichloromethane was distilled over calcium hydride under a 
nitrogen atmosphere. Yields refer to materials purified using silica gel column chro- 
matography. Full experimental details and characterization data for all new com- 
pounds (CH NMR, °CNMR, mass spectrometry, infrared, Revalue), including 14-36, 
2 and 6, appear in Supplementary Information. Crystallographic data were collected 
ona MicroSTAR-H APEX II (ChexStar: RUA #1091) instrument, and the Bruker 
SAINT and SADABS software programs were used for integrating and scaling the 
data, respectively. The CYLVIEW program (developed by C. Y. Legault) was used 
for X-ray depictions. Computational analyses were conducted following confor- 
mational searches using the MMFF94 force field (SPARTAN’10). Density func- 
tional theory calculations were performed with GAUSSIAN09 (B3LYP/6-31+G(d,p) 
theory level). Full details are included in Supplementary Information. 
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Galanin neurons in the medial preoptic 
area govern parental behaviour 


Zheng Wu", Anita E. Autry', Joseph F. Bergan', Mitsuko Watabe-Uchida? & Catherine G. Dulac! 


Mice display robust, stereotyped behaviours towards pups: virgin males typically attack pups, whereas virgin females 
and sexually experienced males and females display parental care. Here we show that virgin males genetically impaired 
in vomeronasal sensing do not attack pups and are parental. Furthermore, we uncover a subset of galanin-expressing 
neurons in the medial preoptic area (MPOA) that are specifically activated during male and female parenting, and a 
different subpopulation that is activated during mating. Genetic ablation of MPOA galanin neurons results in marked 
impairment of parental responses in males and females and affects male mating. Optogenetic activation of these neurons 
in virgin males suppresses inter-male and pup-directed aggression and induces pup grooming. Thus, MPOA galanin neurons 
emerge as an essential regulatory node of male and female parenting behaviour and other social responses. These results 
provide an entry point to a circuit-level dissection of parental behaviour and its modulation by social experience. 


Understanding how neural circuits drive social behaviour is a funda- 
mental question in neuroscience. Parental interactions aimed at the care 
and protection of young are essential for the survival of offspring in 
many animal species. Elaborate parental behaviour is a defining feature 
of mammals, presumably regulated by evolutionarily conserved neural 
circuits’. Intriguingly, the respective roles of the two parents in offspring 
care differ across highly related species: whereas mothers usually assume 
the largest share of parenting, the contribution of fathers varies markedly 
between species, ranging from dedicated parenting of pups to neglect 
and aggression’”. The identification of neuronal circuits controlling the 
display of parental behaviour in males and females should help elucidate 
neural mechanisms underlying this essential social behaviour and provide 
novel insights into the regulation of sexually dimorphic brain functions. 

Insights into the neurobiology of parental behaviour come primarily 
from studies in rodents’. Virgin rats find foreign pups aversive but exhibit 
parental care after continuous exposure to the pups’, or after priming 
with hormones characteristic of parturient females*®. In laboratory 
mice, virgin males and females exhibit markedly different behaviours 
towards pups. Virgin males typically attack pups”*, whereas virgin females 
exhibit spontaneous, stereotyped displays of maternal care*’”. Remark- 
ably, males stop attacking pups and transiently become paternal after 
mating, starting near the time of birth of the pups and lasting until 
weaning’ ''. In female rats, the medial preoptic area (MPOA) and the 
dopaminergic system have been implicated in the control of maternal 
behaviour'*’’. However, the neural mechanisms underlying distinct 
parental behaviours in females and males with different social experi- 
ence remain unknown. 


Vomeronasal control of pup-directed aggression 


The vomeronasal system plays an essential role in regulating sex-specific 
behaviours'*. Males with impaired vomeronasal organ (VNO) signalling 
mount males and females, suggesting impaired gender identification’’. 
Further, VNO-deficient females show notable male-like mounting and 
courtship displays, suggesting that the vomeronasal pathway constitu- 
tively represses male-specific behaviour circuits in females’*. We pro- 
posed that, in males, the vomeronasal pathway may similarly regulate 
female-typical behaviours such as parenting. This idea is supported by 


evidence that vomeronasal areas are activated during pup-directed aggres- 
sion and that disrupted VNO signalling in males reduces aggression 
and facilitates parenting’. 

We used genetic tools to confirm the role of VNO inputs in pup- 
directed behaviours. Genetic ablation of TRPC2, a VNO-specific ion 
channel, impairs vomeronasal signalling*”°. Adult Trpc2‘~ virgin males 
and females and Trpc2*’~ littermates were presented with C57BL/6J 
pups and behavioural responses were observed. In contrast to Trpc2*/~ 
littermates, Trpc2-’~ virgin males showed marked reductions in pup- 
directed aggression (Fig. 1a). Furthermore, a large fraction of Trpc2’~ 
virgin males exhibited parental care typical of females and fathers (Fig. 1a). 
Quantification of behaviour towards pups showed that Trpc2’~ males 
retrieved pups with shorter latency, engaged in more nest-building, and 
were in the nest crouching over and grooming pups longer than Trpc2*/~ 
males. Trpc2 ‘~ males, although clearly parental, displayed less par- 
enting than Trpc2‘~ females (Fig. 1b-f). 

We next investigated the post-mating switch from attacking pups 
to paternal behaviour originally reported in the CF1 mouse strain’. 
Virgin control and mated males tested 1-2 days, or 10-12 days after mat- 
ing attacked pups. However, mated males tested just before pups were 
born at day 17-20 did not attack pups, with half displaying paternal 
behaviour. All males tested at day 25-27 were paternal, consistent with 
previous studies'''*7' (Fig. 1g). 

Thus, opposing behaviour circuits co-exist in the male brain to reg- 
ulate pup-directed aggression and parenting behaviours according to 
social context. In virgin males, vomeronasal circuits activated by pup 
cues elicit pup-directed aggression while pathways underlying parent- 
ing behaviour remain silent. By contrast, mated males repress VNO- 
evoked aggression and instead activate parenting circuits. 


Neuronal activation during parenting 

To identify the brain regions involved in parental care, we compared 
the brain activity patterns of virgin males versus virgin females and 
paternal males using induction of the immediate early gene c-fos (also 
known as Fos) as a read-out of neuronal activation after exposure to 
pups. We focused our analysis on the hypothalamus, amygdala and 
other regions involved in social behaviours (Methods). 


1Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA. Department of Molecular and 
Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts 02138, USA. 


15 MAY 2014 | VOL 509 | NATURE | 325 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a Trpc2*- Trpc2- Trpc2*- Trpc2- © Trpc2*- Troc2- Trpc2*- Trpc27- 
(17) d(11) 910) 99) 317) 811) 968) BB) 
Behaviour nS Time in nest 
type —— 2,000- in nn i 
Retrieve °° s" Seess cere Bye l ¢ of 
~ 1, 4 e ee 
i= 
° 
Ignore ha eee’ = 1,000- —f- 
3 
A 500- 
Attack * 5 i 
iJ coco 
b 100 d Crouching 
~ o 80 2,000) -—“—uy fast o 
oo En 4 
© 3 60 = 1500+ = 
De —Trpc2*~ J 3 ° ° 
£0 —Trpc2- 3 = 1,000- ° 
® 4 40 /- g 
og ~Trpoat @ S 50, °  & 
a 220 —Trpo2 9 4 ae oe 
0) 1 1 1 1 
0 300 600° 900 1,800 e Pup grooming 
Elapsed time (s) 
1,000 ——s 
9g e Attack elgnore «Retrieve — 800 “ 
22-23 Control 8 S 600 : = - 
: @ 400} ° i 
T T T T T T A 200 —— 
2 Birth of pups (*) Os cami = 000 
25-27 : z 4 
* 2 f Nest building 
- = I NS 
17-20 | aS oe —=— TT T° 
He © 300 : 
ee 5 
= 200 
10-12 : a af = 100 0 
! e : [a = 2 1 
: Os ae tee 3 
ia Fe 
ie * 
ie * 
m7 * 
ie * 
T T T T T 


Testdy0 5 10 15 20 25 30 


Figure 1 | Pup-directed behaviour of Trpc2”'~ and Trpc2*’~ virgin 
animals and switch from attack to parenting in males after mating. 

a, Behaviour analysis of Trpc2-‘~ and Trpc2*’~ virgin males demonstrates 
significantly different responses to pups in the presence or absence of VNO 
signalling. Chi-square test with Bonferroni correction, **P < 0.01. 

b, Combined percentage of pups (out of four) retrieved by an animal group as a 
function of time. Kolmogorov-Smirnov test with Bonferroni correction, 
P<0.001 between Trpc2'~ and Trpc2*’~ males, P< 0.01 between Trpc2 ‘~ 
males and Trpc2/ ~ females. c-f, Time spent in nest (c), and duration of 
crouching (d), pup grooming (e) and nest building (f). Mean + s.e.m.; 
Mann-Whitney test with Bonferroni correction, *P < 0.05, **P< 0.01, 

***P < 0.001; NS, not significant. g, Behaviour of Trpc2*! ~ males tested after 
increasing durations of cohabitation with females subsequent to mating. Males 
mated on day 0 except virgin controls, which were individually housed from 
day 0 throughout the test. Male behaviour switches from attack to parenting ata 
time period after mating that corresponds to the birth of their pups. 


Fathers and virgin females robustly activated similar brain areas after 
parental care, namely the anteroventral periventricular nucleus (AVPe; 
data not shown) and the MPOA, and these regions remained consis- 
tently silent in virgin males. Specifically, we observed striking increases 
in the number of MPOA c-fos" cells of maternal virgin females, Trpc2 ’ ~ 
virgin males and paternal fathers (Fig. 2a-e), indicating that a com- 
mon pathway for parental behaviour exists in males and females that is 
normally repressed in virgin males by vomeronasal inputs. The ventral 
bed nucleus of the stria terminalis (BNST)/dorsal MPOA was shown 
to play an important role in rat maternal behaviour'*”, but also in sex- 
ual behaviour”*”, thermoregulation”* and gonadotropin-releasing hor- 
mone (GnRH) secretion”. Accordingly, we observe robust MPOA c-fos 
activation after mating, medial to the area containing parenting-induced 
c-fos (Fig. 2e, f). 

To determine whether parenting and mating activate different MPOA 
neurons, we performed a cellular compartment analysis of temporal 
activity by fluorescent in situ hybridization (catFISH)”, allowing direct 
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Figure 2 | Parenting activates galanin-expressing neurons in the MPOA. 
a-c, c-fos mRNA expression in the MPOA of virgin males (a), fathers (b) and 
virgin females (c) after interaction with pups. d, Schematic illustration of the 
MPOA in sagittal and coronal sections, adapted from the Paxinos and Franklin 
mouse brain atlas. e, Social behaviours induce c-fos activation in the MPOA in 
virgin and mated males and females. Groups are labelled as follows. C, fresh 
bedding exposure; KO, Trpc2 ! ~; fa, father; vf, virgin female; mo, mother. 
Mean + s.e.m., one-way ANOVA followed by Bonferroni’s post test comparing 
all the social interaction groups to fresh bedding control, ***P < 0.001. NS, not 
significant. f, g, catFISH identifying parenting and mating induced c-fos in 
the MPOA in males show that the two behaviours activate largely distinct 
MPOA neuronal populations. Par, Parenting; Mat, Mating; nuc, nuclear 
(yellow); cyto, cytoplasmic (red). Mean + s.e.m., one-way ANOVA followed by 
Bonferroni’s post test comparing all pairs of groups, **P < 0.01. h, Co-labelling 
c-fos and Gal in the MPOA of virgin females after interaction with pups. 

i, j, Percentage of c-fos* cells expressing Gal and percentage of Gal” cells 
expressing c-fos in males and females after various social interactions, 
compared to the percentages of NeuN™ cells expressing Gal and c-fos, 
respectively. Agg, Aggression. Mean + s.e.m., t-test pairing the measurements 
from each animal, adjusted by Benjamini-Hochberg procedure controlling the 
false discovery rate. *P < 0.05, **P < 0.01, ***P < 0.001; NS, not significant. 
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comparison of two activated cell populations. Animals experiencing 
the same behaviour twice showed ~70% overlap of nuclear and cyto- 
plasmic c-fos MPOA signals, whereas animals engaged in different behav- 
iours showed only 20-30% overlap, indicating that mating and parenting 
activate largely distinct MPOA neuronal populations (Fig. 2f, g). 

The MPOA is a highly heterogeneous structure*’, which receives 
inputs from, and sends information to, multiple brain regions****. The 
identity of cell populations governing parental behaviour is unknown. 
We characterized active cells in parental behaviour using double fluo- 
rescent in situ hybridization with c-fos and a series of molecular mar- 
kers with distinct MPOA expression** (Methods). We uncovered the 
neuropeptide galanin (Gal) as a candidate marker for MPOA c-fos* 
cells in virgin females, mothers, and fathers. Across all markers sur- 
veyed, Gal showed the highest enrichment in parenting-induced c-fos* 
MPOA cells (Extended Data Fig. 1a, b). 38.3% + 1.6% of MPOA c-fos* 
cells in virgin females, 43.9% + 4.6% in mothers, and 33.4% + 0.8% in 
fathers co-express Gal(mean + s.e.m., t-test pairing each animal, P< 0.001 
for virgin females and fathers, P < 0.05 for mothers; Fig. 2h, i). Further, 
24.8% + 0.8% of MPOA Gal" cellsin females, 26.7% + 1.4% in mothers, 
and 16.8% + 0.9% in fathers co-express c-fos (mean + s.e.m., paired 
t-test, P< 0.001 for virgin females and fathers, P< 0.01 for mothers; 
Fig. 2j). Gal is also found in minor subsets of mating and aggression- 
induced c-fos* cells in males, whereas overlap between Gal and c-fos 
induced by pup-directed aggression is not significantly different from 
chance level (Fig. 2i, j). 

Gal is expressed in several brain areas and modulates multiple phys- 
iological functions*’. Gal is also co-expressed by prolactin-secreting cells 
of the pituitary and involved in lactation*’. We found that MPOA Gal* 
cell number is not sexually dimorphic, although MPOA Gal expression 
level is slightly higher in females than in males (Extended Data Fig. 1c, d). 
Most MPOA c-fos* and Gal" cells express Gad1, characteristic of GABA 
ergic inhibitory neurons (Extended Data Fig. le-h). 


Ablation of MPOA Galt neurons 


We next investigated the requirement of MPOA Gal* neurons for par- 
ental behaviour in females and mated males. We obtained a Gal-Cre 
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transgenic line (GENSAT) and confirmed appropriate Cre expression in 
MPOA Gal* neurons: 94.6% of the Gal” cells co-express Cre (n = 858 
cells in 2 animals) and 94.8% of the Cre* cells co-express Gal (725 cells 
in 2 animals; Extended Data Fig. 2a). To specifically ablate MPOA Gal 
neurons, Gal-Cre mice were given bilateral MPOA injections of recom- 
binant adeno-associated virus (AAV) expressing Cre-dependent diphthe- 
ria toxin A fragment (AAV-DTA) (Extended Data Fig. 2b). On average, 
AAV-DTA eliminated ~60% of MPOA Gal" cells, compared to Gal- 
Cre-negative littermate controls receiving the same treatment (Extended 
Data Fig. 2c, d). We verified that an independent MPOA cell population 
expressing thyrotropin releasing hormone (Trh) was not affected by 
targeted ablation (Extended Data Fig. 2e). Furthermore, neighbouring 
Gal* cells in the AVPe, paraventricular nucleus (PVN) and dorsome- 
dial hypothalamic nucleus (DMH) were unaffected, confirming the 
spatial specificity of viral-mediated ablations (Extended Data Fig. 2f-h). 

Virgin females with MPOA Gal* neuron loss showed striking reduc- 
tions in maternal behaviour and emergence of pup-directed aggression 
(Fig. 3) compared to Gal-Cre-negative littermates or Gal-Cre females 
with AAV-Flex-GFP viral injections (Extended Data Fig. 3a-f). The dura- 
tion of overall maternal interaction is positively correlated with the number 
of remaining Gal" cells (Fig. 3a;n = 23, P<0.05, R = 0.46). Moreover, 
whereas virgin females with low ablation of MPOA Gal" cells were 
maternal, females with ablation efficiencies above 50% displayed loss of 
maternal care with increased pup-directed aggression (Fig. 3b), accom- 
panied by significantly reduced crouching, nest building, retrieval to nest, 
and maternal interaction compared to controls (Fig. 3c—h). Thus, MPOA 
Gal* cells represent an essential neuronal population for the maternal 
behaviour of virgin females. 

Next, we examined the effects of MPOA Gal" cell ablation on retriev- 
ing behaviour of nursing females (Methods). Control mothers retrieved 
all four pups, whereas most mothers with loss of over 50% Gal” MPOA 
cells failed to retrieve pups, suggesting a critical role of Gal® cells in 
maternal behaviour of lactating females (Extended Data Fig. 4a-c). 

We then tested the requirement of Gal* neurons for male parental 
behaviour (Methods). As with females, disappearance of parental behav- 
iour in males was associated with loss of over 50% of Gal* cells (Fig. 4a, b). 
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Figure 3 | Ablation of MPOA Gal* neurons impairs maternal behaviour in 
virgin females. a, Linear regression of maternal interaction and the number of 
remaining MPOA Gal" cells in ablated virgin females. Animals are colour 
coded by their behaviour categories. Pearson correlation, n = 23, P< 0.05, 
R= 0.46. b, Cumulative percentages of females that retrieved or attacked pups 
as a function of the percentage of remaining Gal” cells, n = 23. Reference cell 
number (100%) is the average MPOA Gal* cell number in the control group. 
As the remaining number of Gal* cells increases or decreases on the x-axis, 


each female is added to the maternal group or the infanticidal group according 
to its behaviour type, respectively. c, Behaviour of ablated females with over 
50% ablation efficiency (n = 15) compared to control (n = 15). Chi-square test, 
P<0.05. d, Combined percentage of pups (out of two) retrieved by the ablation 
group as a function of time, compared to the controls. Kolmogorov-Smirnov 
test, P< 0.05. e-h, Crouching (e), pup grooming (f), nest building (g) and 
maternal interaction (h). Mean + s.e.m. Mann-Whitney test, *P < 0.05. 
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Figure 4| Ablation of MPOA Gal‘ neurons impairs paternal behaviour in 
fathers. a, Linear regression of paternal interaction and number of remaining 
Gal* cells in the MPOA in ablated fathers. Animals are colour coded by their 
behaviour categories. Pearson correlation, n = 15, P = 0.21, R = 0.34. 

b, Cumulative percentages of paternal males (Retrieve) as a function of the 
percentage of remaining Gal™ cells, n = 15. Reference cell number (100%) is the 
average MPOA Gal’ cell number in the control group. c, Behaviour type of 


Behaviour assays showed that only 14.3% of males with over 50% MPOA 
Gal* neuronal loss (n = 14) displayed paternal behaviour 3 weeks after 
mating, compared to 75% of littermate controls (n = 12; Fisher’s exact 
test, P< 0.01; Fig. 4c). Ablated animals showed deficits in crouching, 
pup grooming, nest building, retrieval to nest, and overall paternal inter- 
action compared to controls (Fig. 4d-h). 

Gal* cell ablation did not affect locomotion or inter-male aggres- 
sion (Extended Data Fig. 5a—f), but decreased mounting duration and 
increased latency to mount (Extended Data Fig. 5g-i). This mating 
defect may result from ablation of the small subset of MPOA Gal’ cells 
activated during mating or from interactions between brain circuits con- 
trolling parenting and mating. 

To further assess the functional specificity of MPOA Gal” cells in 
behaviour control, we examined the effect of ablating MPOA tyrosine 
hydroxylase (Th) cells using AAV-DTA in Th-IRES-Cre males”. ~70% 
of Th” cells were ablated compared to littermate controls (Extended 
Data Fig. 6a, b). The ablation was restricted to the MPOA, as the AVPe 
Th” cells were largely unaffected (Extended Data Fig. 6c). Although 
MPOA Th* cell loss was comparable to Gal™ cell loss (Extended Data 
Fig. 6d), it did not affect parenting, mating, or inter-male aggression in 
males (Extended Data Fig. 6e-o), highlighting the critical role of Gal 
cells in the control of parenting. 

Remarkably, specific ablation of Gal* cells affected all major aspects 
of parental behaviour. Additionally, whereas a significant fraction of 
virgin females with strong reduction in Gal" neurons attacked pups, 
no mated males or nursing females with high ablation efficiency dis- 
played pup-directed aggression. This result suggests that, in virgin females, 
Gal* neurons are important for both maternal behaviour and inhibi- 
tion of pup-directed aggression, whereas in fathers and mothers, mat- 
ing suppresses circuits for pup-directed aggression independently of 
Gal* neuronal activation. 


Activation of MPOA Gal* neurons 


To address whether activation of MPOA Gal” neurons is sufficient to 
suppress pup-directed aggression and potentiate parental behaviour, 
virgin males and fathers were tested during optogenetic activation of 
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ablated fathers with over 50% ablation efficiency (n = 14) compared to control 
(n = 12). Fisher’s exact test, **P < 0.01. d, Combined percentage of pups 
retrieved (out of two) by the ablation group as a function of time, compared to 
the controls. Kolmogorov-Smirnov test, P < 0.001. e-h, Crouching (e), pup 
grooming (f), nest building (g) and paternal interaction (h). Mean + s.e.m., 
Mann-Whitney test, *P < 0.05, **P< 0.01. 


Gal neurons. Gal-Cre males were given MPOA-targeted injections of 
a Cre-dependent channelrhodopsin-2 fused with enhanced yellow fluo- 
rescent protein virus (AAV-ChR2:EYFP) and implanted with an optic 
fibre. Negative controls were Gal-Cre-negative littermates receiving the 
same treatment. In stimulation trials, blue light was delivered to the 
MPOA whenever the male contacted a pup with its snout. Post-mortem 
in situ hybridization confirmed specific MPOA ChR2:EYFP expres- 
sion in Gal” cells (Fig. 5a, b). ~60% of MPOA Gal* cells expressed 
AAV-ChR2:EYFP, similar to the expression of AAV-DTA in ablation 
experiments (Extended Data Fig. 9k). Additionally, we verified that 
parenting-induced c-fos" and c-fos subpopulations of Gal” cells showed 
comparable viral infection rates (Extended Data Fig. 9k). Light stimu- 
lation in awake behaving animals produced strong c-fos induction in 
MPOA Gal* cells of Gal::ChR2 males, but not control males (33.5% + 3.3% 
for Gal::ChR2 males, 6 animals; 4.1% + 0.2% for controls, 8 animals; 
mean = s.e.m., t-test, P< 0.001). 

Wefirst investigated whether Gal” cell activation reduced pup-directed 
aggression. Each male was tested multiple times with stimulation (stim) 
and non-stimulation (no stim) (Methods). Light stimulation of MPOA 
Gal neurons in Gal::ChR2 males inhibited attacking in 16 of 18 trials 
(6 animals, 2-4 trials per animal), whereas the same animals attacked 
in 18 of 19 trials without stimulation (Fig. 5c, d). Loss of pup-directed 
aggression was not due to pup-avoidance, as light-stimulated Gal::ChR2 
virgin males displayed frequent and lengthy bouts of pup grooming not 
observed in controls (Fig. 5e, f and Extended Data Fig. 7). However, 
light stimulation did not significantly alter the behaviour of control vir- 
gin males (Fig. 5c-f and Extended Data Fig. 7). 

We next observed effects of light stimulation on parental behaviour 
of fathers (Methods). Light stimulation elicited strikingly elevated pup 
grooming in Gal::ChR2 compared to non-stimulated fathers (Fig. 5g, 5i; 
Extended Data Fig. 8). Interestingly, induction of active pup grooming 
in Gal::ChR2 stimulated males was seen at the expense of crouching 
(Fig. 5h, i and Extended Data Fig. 8). 

To address the specificity of Gal” cell activation in parental behav- 
iour, we also tested other behaviours. Gal™ cell activation left mating 
behaviour unaffected, but diminished inter-male aggression and increased 
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Figure 5 | Optogenetic activation of MPOA Gal" neurons in males 
suppresses attack and promotes pup grooming. a, b, Co-labelling Gal and 
ChR2:EYFP expression in the MPOA of Gal::ChR2 (a) and control males (b). 
c, Percentage of trials with attacks of pups by virgin males. Fisher’s exact test 
with Bonferroni correction, ***P < 0.001; NS, not significant. d, Percentage of 
pups attacked by each group of virgin males. Gal::ChR2 stimulated (stim) trials 
are significantly different from Gal::ChR2 not stimulated (no stim) and control 
stimulated (control stim) trials. Kolmogorov-Smirnov test with Bonferroni 
correction, P< 0.001. e, Pup grooming in the tests with virgin males. 


locomotion (Extended Data Fig. 9a-g), whereas length of social contact 
was equivalent in control and stimulation trials across assays (Extended 
Data Fig. 9h, i). Duration of light illumination was also comparable across 
all stimulation experiments (Extended Data Fig. 9)). 

These results indicate that optogenetic activation of MPOA Gal 
cells is sufficient to suppress pup-directed aggression and induce active 
pup grooming. The suppression of inter-male aggression and increased 
locomotion may result from increased parenting and pup-seeking, or 
from other unknown behavioural drives. Surprisingly, whereas abla- 
tion of Gal" cells leads to mating defects, activation of these cells did 
not increase mating. This may reflect unknown complexity in social 
circuit coding, or originate from slightly different virus infectivity in 
ablation and activation experiments. 


Discussion 


Our data provide significant insights into the control of opposing social 
behaviours in mice: parenting versus pup-directed aggression. Whereas 
vomeronasal circuits in virgin males mediate aggression towards pups, 
this response is silenced in females and mated males, and neuronal 
pathways underlying parental care are activated instead. We show here 
that MPOA Gal-expressing cells are critical for the control of mouse 
parental behaviour and the suppression of pup-directed aggression, 
thus acting as a central regulatory node of social interactions with pups. 
Manipulation of this genetically defined neuronal population switches 
on or off the parental behaviour of mice, providing a precious entry point 
for further dissection of neural circuits underlying parental care and their 
modulation by social experience. The functional heterogeneity among 
Gal™ cells, also reported in most neuropeptide-expressing neurons**”, 
may underlie the observed partial modulation of other social behaviours. 


Time (s) 


Mean + s.e.m.; Mann-Whitney test with Bonferroni correction. **P < 0.01, 
***P < 0.001; NS, not significant. f, Sample behaviour raster plot of Gal::ChR2 
stimulated and control stimulated trials in virgin males. Note that two 
behaviour elements (such as pup grooming and handling) can occur 
simultaneously. g, Pup grooming in the tests of fathers. n = 8 for each group, 
t-test pairing the same animal with and without light stimulation, 

***P < 0.001. h, Crouching in the tests of fathers. n = 8, paired t-test, 

*P < 0.05. i, Sample behaviour raster plot of Gal::ChR2 stimulated and 
Gal::ChR2 not stimulated trials in tests with fathers. 


A more refined characterization of Gal* neuron subpopulations may 
help identify subsets of MPOA neurons involved in distinct behaviours. 

Interestingly, ablation of MPOA Gal* neurons leads to reductions 
in all tested aspects of parenting, whereas MPOA Gal” neuron activa- 
tion triggers pup grooming but no other parental displays. An under- 
standing of the natural pattern of MPOA Gal” neuron activity during 
parental interactions, particularly during intense care such as grooming 
versus more passive display like huddling with pups, may help optimize 
ChR2-mediated stimulation of MPOA Gal* neurons and its behavi- 
oural outcome. Additionally, although MPOA Gal* neuronal activity 
seems essential for parenting behaviour, some behavioural displays may 
require simultaneous activation of additional neuronal populations. 
Interestingly, activation of MPOA Gal” neurons increases locomotion 
without affecting social contact and decreases inter-male aggression, sug- 
gesting complex functional relationships between parenting and other 
behaviour circuits. 

From our results, the relationship between circuits mediating parental 
care and pup-directed aggression is complex and modulated by social ex- 
perience. Virgin males with activated MPOA Gal" neurons do not attack 
pups, indicating that these neurons suppress pup-directed aggression 
directly. Indeed, loss of MPOA Gal* neurons impairs parental behav- 
iour and elicits pup-directed aggression in virgin females. However, MPOA 
Gal* neuron ablation suppresses parental behaviour without facilitat- 
ing pup-directed aggression in mothers or fathers, suggesting that cir- 
cuits underlying pup-directed aggression are silenced in mated animals 
through independent mechanisms. Future circuit-level analysis of MPOA 
Gal* neurons will help uncover mutual connections between circuits 
underlying parenting, pup-directed aggression and mating, and assess 
connectivity with other brain areas participating in parenting’*“’. 
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Finally, a variety of hormones and neuropeptides, including oestra- 
diol, testosterone, prolactin, progesterone and oxytocin, modulate par- 
enting according to the physiological state of the animal and its social 
context”. It will be interesting to determine if Gal, a neuropeptide 
involved in modulation of many homeostatic and reproductive func- 
tions is a new player in the regulation of parental behaviour. 


METHODS SUMMARY 


Behavioural analysis of parental behaviour was performed as described in the methods 
section. In situ hybridization and catFISH were performed as previously described. 
Targeted ablation was performed by injecting the MPOA bilaterally in Gal-Cre 
animals with AAV-DTA. For cell activation, AAV-ChR2::YFP was injected bilat- 
erally into the MPOA of Gal-Cre animals and an optical fibre was implanted, allow- 
ing stimulation using blue light. Details of the experimental setup are provided in 
the Methods. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Animals. Animals were maintained on 12 h:12h light/dark cycle (lighted hours: 
02:00-14:00) with food and water available ad libitum. Animal care and experi- 
ments were carried out in accordance with the NIH guidelines and approved by 
the Harvard University Institutional Animal Care and Use Committee (LACUC). 

Trpc2 knockout mice of C57BL/6J x129/Sv mixed genetic background were 
generated previously in our laboratory. The complete null allele of the Trpc2 gene 
locus was confirmed by western blotting”’. 

The Gal-Cre BAC transgenic line (STOCK Tg(Gal-cre)KI87Gsat/Mmucd, 031060- 
UCD) was imported from the Mutant Mouse Regional Resource Center. In this 
line, a Cre recombinase cassette followed by a polyadenylation sequence is inserted 
at the ATG codon of the first coding exon of the Gal gene. The imported line was in 
an FVB/N-Crl:CD1(ICR) mixed genetic background and backcrossed to C56BL/6J 
genetic background in our breeding colony. The animals used in the study came 
from the F1 generation. 

The Th-IRES-Cre knock-in line was imported from the European Mouse Mutant 
Archive (00254). An IRES-Cre construct was inserted in the 3’ untranslated end of 
the Th gene. The Th expression is not affected and Cre protein is produced in Th- 
expressing cells*”. This line was generated originally in a mixed genetic background 
of 129/SvJ and C57BL/6] and then back crossed to C57BL/6J. 

Behaviour assay. Before behaviour tests animals were housed individually for 
about one week. Experiments started at the beginning of the dark phase and were 
performed under dim red light, unless noted otherwise. Each test was videotaped 
(Sony DCR-HC65 camcorder in nightshot mode, Microsoft LifeCam HD-5000 or 
Geovision surveillance system) and the behaviours were scored by an individual 
blind to the genotype using the Observer 5.0 or XT 11 software (Noldus Informa- 
tion Technology). When one animal is tested in multiple behaviour assays, they 
are allowed at least 48h rest between tests. 

Parental behaviour assay of Trpc2 knockout animals. 2- to 4-month-old, Trpc2*/~ 
and Trpc2 ‘~ virgin male and female littermates were individually housed for approx- 
imately one week before the test. 1- to 3-day-old naive C57BL/6J pups were used as 
the standard pup intruder in all the behaviour assays performed in this study. The 
pups are of a different strain from the Trpce2'~ and Trpc2 ‘~ animals and there- 
fore are not related to the resident animals. The pregnant females were separated 
from the stud before parturition, so the pups are not exposed to their fathers and do 
not carry any adult male odour. Four naive C57BL/6] pups were introduced to the 
home cage of each animal and placed at the farthest corner from the resident’s 
resting nest. The first olfactory investigation marked the beginning of the assay, 
which then extended until 30 min after all the pups were retrieved, or until the resi- 
dent attacked and wounded the pups, or for 30 min in case neither of above happened. 
When a pup was attacked, the assay was ended immediately and the wounded pup 
was euthanized. 

The behaviour of the animals was categorized based on the following criterion: 
animals that retrieved all the pups to the nest or built a new nest around the pups 
within 30 min and crouched over pups were categorized as ‘Retrieve’. Animals that 
attacked the pups within 30 min were scored as ‘Attack’. All the other animals were 
categorized as ‘Ignore’. In most of the cases, retrieving is an all-or-none event such 
that if an animal retrieves one pup, it retrieves all the pups. An animal is scored 
as ‘Ignore’ if it does not retrieve all four pups or does not crouch over them after 
retrieval. Following IACUC guidelines, behaviour assays must be stopped before 
animal attacking pups have the ability to kill them. Thus, to accurately describe the 
attack behaviour, we mainly used ‘pup-directed aggression’ or ‘attack’ instead of 
‘infanticide’. 

The following behaviours were scored: latency to retrieve each pup (picking up 
a pup with its mouth and carrying it to the nesting area), latency to attack (biting a 
pup, often accompanied by actual wounds on the pup and confirmed immedi- 
ately after the test), grooming (sniffing and licking a pup), crouching (extending 
its limbs, assuming a nursing-like posture and huddling over at least 2 pups), nest 
building (collecting and arranging nesting material and making a nest), time spent 
in the nest and parental interaction (‘maternal interaction’ for females and ‘paternal 
interaction’ for males; calculated as the cumulative time spent crouching, grooming 
pups, and nest-building). Grooming, crouching, time in the nest and nest build- 
ing were scored as duration during the 30-min recording after all the pups were 
retrieved. The latencies to retrieve or attack pups were recorded in seconds. Some 
behavioural variability is observed in control animals across various experiments 
due to the different genetic background of the transgenic lines used in each exper- 
iment. Trpc2*’~ females are in C57BL/6J X 129/Sv mixed genetic background. 
Gal-Cre animals were originally in FVB/N-Crl:CD 1 (ICR) mixed genetic background 
and were backcrossed to C56BL/6J in our breeding colony. Gal-Cre virgin females 
used in the study were from an F1 generation, and exhibited lower level of maternal 
behaviour than Trpc2*’~ virgin females. 

Parental behaviour assay for mated males (Fig. 1g). Trpc2*’~ virgin males were 
individually housed and then paired with females, which were checked daily for vaginal 


ARTICLE 


plugs in the next few days. Once a plug was spotted, the day was marked as day 0 for 
the mating pair and that pair was randomly assigned to a group for different length 
of cohabitation (1-2 days, 10-12 days, 17-20 days or 25-27 days). According to 
their group, the males were tested one day after the females and their litters (if any) 
were removed from their home cage. For example, animals tested on day 1 were 
separated from their mates on day 0. The animal tested on day 20 was separated 
from its mate on day 19 and was not exposed to its own litter. The negative controls 
for this essay were individually housed Trpc2*'~ virgin males. 

Mating behaviour assay. ~8 weeks old, receptive virgin females (as determined 
by vaginal smear) of C57BL/6) background were introduced to the resident mouse 
cage. Each test runs for 15 min and was videotaped and scored for the following 
parameters: sniffing, mounting and mounting with pelvic thrust. 

Inter-male aggression assay. ~8 weeks old, castrated male of C57BL/6] back- 
ground (castration performed by the Jackson Laboratory) swabbed with 50 ul fresh 
urine from intact wild-type males were introduced to the resident mouse cage. 
Every 15 min test was videotaped and scored for the following parameters: attack, 
sniffing and grooming intruder. 

Open field test. Animals are tested for 5 min in a 60cm X 60cm square open 
arena under normal lighting. The position of the animals is tracked and analysed 
by Ethovision XT 8 software to calculate the distance moved, average velocity and 
the time spent in the centre zone. The centre zone is defined as the centre square 
(42 cm X 42 cm) which comprises 50% of the total area. 

RNA in situ hybridization. Fresh brain tissues were collected from animals housed 
in their home cage or 35 min after the start of the behaviour tests when c-fos expres- 
sion is analysed. For social behaviour induced c-fos analysis, the behaviour para- 
digm is generally as described in the Behaviour assay section. Only animals that 
actually displayed a certain behaviour were selected, that is, males that displayed 
mounting behaviour or females that were mounted were selected for mating induced 
c-fos analysis, males that attacked intruder for inter-male aggression induced c-fos 
analysis, animals that crouched over pups in a nest for parenting induced c-fos ana- 
lysis, and males that attacked pups for c-fos induced by pup-directed aggression. 
The dissected brains were embedded in OCT (Tissue-Tek) and frozen with dry ice. 
20-|um cryosections were used for mRNA in situ hybridization. Adjacent sections 
from each brain were usually collected over a few replicate slides to generate copies 
for staining with multiple probes. 

Fluorescent mRNA in situ hybridization was performed largely as described”. 
Complementary DNA of c-fos, Gal, Trh, Th, Gad1, Vglut2, EYFP, GEP, ChR2, Cre, 
mCherry mRNA and other MPOA molecular markers (Esr1, Esr2, Cyp19a1, Ar, 
Per, Prir, Hert, Cart, Tacl, Penk, Bdnf, Peg10, Pvalb, Calb1, Calb2, Vip, Nos1, Cck, 
Sst, Nts, NR5a1, Npy) were cloned in approximately 800-base-pair (whenever pos- 
sible) segments into pCRII-TOPO vector (Invitrogen). Antisense complementary 
RNA (cRNA) probes were synthesized with T7 or Sp6 polymerases (Promega) and 
labelled with digoxigenin (DIG; Roche), fluorescein (FITC; Roche) or dinitrophenol 
(DNP; PerkinElmer). Where necessary and possible, a cocktail of 2-4 probes were 
generated covering different segments of the target mRNA to maximize strength 
of signal. 

mRNA hybridization was performed with 0.5-1.0 ng pl” ' CRNA probes at 68 °C. 
The probes were detected using horseradish peroxidase (POD)-conjugated anti- 
bodies (anti-FITC-POD at 1/250 dilution, Roche; anti-DIG-POD at 1/500 dilution, 
Roche; anti- DNP-POD at 1/100 dilution, PerkinElmer). The signals were amplified 
using biotin-conjugated tyramide (PerkinElmer) and subsequently visualized with 
Alexa Fluor 488-conjugated streptavidin or Alexa Fluor 568-conjugated strepta- 
vidin (Invitrogen), or directly visualized with TSA plus cyanine 3 system, TSA plus 
cyanine 5 system or TSA plus Fluorescein system (PerkinElmer). Tissues were mounted 
with Vectashield (Vector labs) containing 8 pg ml * 4’ ,6-diamidino-2-phenylindole. 

For catFISH, animals were subject to two 5-min episodes of behaviours inter- 
leaved with a 30 min interval, and were euthanized immediately after the second 
episode. The c-fos cytoplasmic signal induced by the first behaviour episode was 
compared to the c-fos nuclear signal induced by the second, allowing direct com- 
parison of the two activated cell populations. The same cRNA c-fos probes described 
above were used to detect cytoplasmic signal as well as nuclear signal, and an intron 
probe”! containing the first intron of the c-fos gene was used to detect only the 
nuclear signal. 

Immunohistochemistry. Immunohistochemistry was performed according to 
standard protocols. NeuN was detected with primary antibody Mouse Anti-NeuN 
(1:3,000; Millipore, MAB377) and then amplified by Alexa Fluor 555 donkey anti- 
mouse IgG (1:500; Life Technologies). 

Image analysis and cell counting. All the microscopy images were acquired with 
AxioImager Z2 and AxioVision software with a X 10 objective (Zeiss). Brain areas 
were determined based on landmark structures and white matters such as the ven- 
tricles, anterior commissure and optic tract, with the occasional assistance of Nissl 
staining and other area-specific molecular markers on adjacent sections when neces- 
sary. Areas of interest in the c-fos expression analysis included the MPOA, anteroventral 
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periventricular nucleus, bed nucleus of stria terminalis, medial amygdala, poster- 
omedial cortical amygdala, nucleus accumbens, lateral septal nucleus, suprachias- 
matic nucleus, paraventricular nucleus, anterior basomedial nucleus, ventromedial 
hypothalamic nucleus and dorsomedial hypothalamic nucleus. After manual assign- 
ment of brain structures, automated cell counting was performed using Image] with 
custom-written macro scripts. Sample images were manually counted by experi- 
menters blind to the test condition to verify the reliability of automated cell count- 
ing. For a given brain area, the absolute cell number was determined by summing 
up the cell counts of all the sections deemed as part of that area, adjusted by the 
number of the slicing replicates collected in cryosectioning. 

Targeted cell ablation in the MPOA. The rAAV8/EF1o-mCherry-Flex-dtA (AAV- 
DTA) construct was generated using the A subunit of the diphtheria toxin gene 
from a PGKdtabpA plasmid (Addgene plasmid 13440)*”. The recombinant vectors 
were then serotyped with AAV8 coat proteins and packaged by the viral vector core 
at the University of North Carolina. AAV-DTA (4 X 10!” viral particles ml~') was 
injected bilaterally in the MPOA of Gal-Cre or Th-IRES-Cre males in the amount 
of 0.8 tl on each side (Bregma: 0.0 mm, midline: +0.5 mm; dorsal surface: —5.0 mm) 
with Nanoject II injector (Drummond Scientific). The negative control for Gal* 
cell ablation consisted of Cre- littermates receiving the same treatment. In the cell 
ablation of nursing mothers, one animal injected with AAV-Flex-taCasp3-TEVp” 
(3 X 10'? viral particles ml ') to achieve better ablation efficiency was included in 
the data. 

The AAV-CAG-Flex-GFP (AAV-GFP) construct was developed by E. Boyden 
and it was packaged in serotype 8 by viral vector core at the University of North 
Carolina. AAV8-GFP (6 X 10’ viral particles ml” ') was injected in the same manner 
as described above in Gal-Cre* animals as controls for Gal* and Th* cell ablation. 
It was also used to assess the infection rate of the MPOA Gal™ and parenting- 
induced c-fos* cells, since AAV-DTA infection leads to cell death and prevents an 
accurate estimation. To test the infection rates, Gal-Cre females with AAV-GFP injec- 
tions were subject to a standard parental assay and then analysed by Gal/c-fos/GFP 
triple mRNA in situ hybridization. 

For parental behaviour, virgin females were allowed about 4 weeks of recovery, 
enabling optimal DTA expression and cell ablation before behaviour testing. Each 
female was individually housed and tested with two C57BL/6 pups, in a similar 
manner as described earlier. Retrieving, attacking, crouching, pup grooming, nest 
building and overall maternal interaction were scored. For parental behaviour test 
of the fathers, males were allowed about one week of recovery after surgery and 
then paired with females until the females gave birth (~3 weeks). 1-2 days after 
the pups were born, males were separated from their mates and litters, individu- 
ally housed for 2-3 days and tested in a 30-min behaviour assay with two C57BL/ 
6) pups. Retrieving, attacking, crouching, pup grooming, nest building and overall 
paternal interaction were scored. For mothers, females were allowed about one 
week of recovery after injection and then paired with males, which were removed 
from the females about 1 week before term. On PO, after removing the litters from a 
mother, 4 of the pups were re-introduced into the cage and retrieving behaviour 
was observed for 10 min. The brains were collected after behaviour assays for his- 
tological analysis. 

ChR2-mediated cell activation. The AA V-EF1%-DIO-hChR2(H134R):EYFP (AAV- 
ChR2:EYFP) construct was a gift of K. Deisseroth™ and the recombinant AAV vectors 
were serotyped with AAV5 coat proteins and packaged by the viral vector core at 
the University of North Carolina. Gal-Cre males were tested with pups and those 
attacked pups were selected for surgery. 0.8 pil of AAV-ChR2 (4 X 10!” viral part- 
icles ml ') was injected bilaterally into the MPOA of Gal-Cre males (Bregma: 
0.0 mm, midline: +0.5 mm; dorsal surface: —5.0 mm) using Nanoject II injector 
(Drummond Scientific). After injection, a small plastic adaptor holding an optical 
fibre (300-j1m diameter; Polymicro technologies) was implanted above the MPOA 
and affixed to the skull with dental cement (Bregma: 0.0 mm, midline: +0.2 mm; 
dorsal surface: —4.2 mm). The implant was positioned close to the midline to cover 


the MPOA in both hemispheres and lowered to a depth of approximately 0.8 mm 
above the centre of the AAV injection. A threaded plastic cap (Plastics One) was 
used to cover the implant during recovery and between experiment sessions. Gal- 
Cre-negative males treated with the same procedure were the negative controls. 

The males were tested after at least 2 weeks of recovery. Before stimulation, the 
implant was connected to an optical fibre (300-.1m diameter, Polymicro technol- 
ogies), which was connected in turn to a blue laser via an optical commutator per- 
mitting free movement of the animals. The optic fibre was flexible and long enough 
to allow the animal to freely behave and interact with the intruder. Both Gal::ChR2 
and control animals were tested for 2-4 trials with stimulation (stim) and non- 
stimulation (no stim) trials randomly assigned in 1:1 ratio. In each trial, one C57BL/ 
6] pup was introduced to the male’s home cage to minimize the number of pups 
used in this assay, as most of the males are likely to attack pups. Blue light (473 nm) 
was delivered in 30-ms pulses at 20 Hz for 1-4 s whenever the male contacted the 
pup with its snout. The light power exiting the fibre tip was at ~ 10-20 mW, ensur- 
ing a light intensity above ~1.0 mW mm ” over the entire MPOA®. There was 
almost no leakage of light from the optic fibre or the adaptor. Each trial was up to 
5 min but when the male attacked and wounded the pup, the trial was ended and 
the pup was euthanized immediately. The following behaviour was scored and quan- 
tified: pup grooming (as the male sniffs or licks the pup), handling (as the male holds 
the pup with two forepaws), aggression (as the male grabs the pup violently and 
attempts to bite, usually does not wound the pups but cause them to struggle and 
make distress calls) and pup distress calls (only audible calls were recorded). 

For paternal behaviour assays, the Gal::ChR2 and the control males were paired 
with females. After their pups were born, the females and the pups were removed 
and the males were tested in their home cage by introducing two C57BL/6J pups. 
Each male was tested in two 10-min trials with one stimulation and one non- 
stimulation trial in randomized order. Blue light is delivered when the males sniff 
or lick the pups. None of the males attacked pups or displayed obvious aggression. 
Retrieving, pup grooming, crouching and nest building behaviours were scored 
and quantified as described above. 

After behaviour assays, the brain tissues of these animals were collected after a 
standard c-fos induction protocol to analyse the efficiency of viral infection and 
cell activation. A train of light was delivered in 30-ms pulses at 20 Hz for 2 s, repeated 
every 10s for 15 min, at experimental light intensity. Co-labelling between Gal, ChR2: 
EFYP and c-fos was analysed by mRNA in situ hybridization. Two Gal::ChR2 ani- 
mals with less than 20% of MPOA Gal” cells expressing c-fos were discarded from 
the group. The fibre implants from both Gal::ChR2 and control animals were ver- 
ified for efficient light transmission. 

Statistics. The sample sizes in our study were chosen based on common practice 
in animal behaviour experiments. Data were first tested with Lilliefors test for nor- 
mality. Ifthe null hypothesis that the data come from a normal distribution cannot 
be rejected, Student’s t-test was used. Otherwise, the Mann-Whitney test was used. 
Due to the strong non-normality of the behaviour data, Mann-Whitney test was 
used for all the behaviour analysis. For categorical data, Fisher’s exact test was used. 
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Extended Data Figure 1 | Identification of Galas a marker for cells involved 
in parenting and characterization of MPOA Gal" cells. a, Enrichment 
ratio of markers in parenting induced MPOA c-fos in virgin females. The 
enrichment ratio of a given marker is calculated as the percentage of the c-fos* 
cells co-expressing the marker, divided by the percentage of NeuN* cells 
co-expressing this marker. b, The percentages of parenting induced MPOA 
c-fos* cells co-expressing markers and the percentages of marker cells 
co-expressing c-fos. c, Percentages of Gal* cells in the MPOA in virgin and 
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in MPOA Gal* cell representation. Mean + s.e.m., one-way ANOVA, P > 0.2. 
d, Fold increase of Gal mRNA in situ staining intensity compared to 
background in virgin females, virgin males and fathers. Gal mRNA expression 
is slightly higher (10% increase) in females than in males. Mean + s.e.m., 
one-way ANOVA, ***P < 0.001, NS, not significant. e, f, Percentages of c-fos* 
cells co-expressing Gad] in fathers and virgin females. ND, not determined. 
Mean + s.e.m., t-test, **P < 0.01. g, h, Percentages of Gal* cells co-expressing 
Gad1 in virgin males, fathers and virgin females. Mean + s.e.m., one-way 
ANOVA, P> 0.1. 
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Extended Data Figure 2 | Targeted Gal* cell ablation in the MPOA. 

a, Co-labelling of Gal and Cre expressing cells by mRNA in situ hybridization in 
Gal-Cre females indicates near perfect overlap. b, Schematic map of the 
Cre-dependent AAV-DTA virus; DTA is doubly flanked by two sets of 
incompatible lox sites and inverted to enable transcription after Cre-mediated 
recombination. c, Gal mRNA expression in the MPOA of ablated and control 
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males. d, Number of MPOA Gal" cells in ablation group compared to controls. 
Mean + s.e.m., t-test, ***P < 0.001. e, Number of MPOA Trh* cells in the 
ablation group and control. Mean + s.e.m., t-test, P > 0.2. f-h, Gal* cell 
numbers in the AVPe (f), anterior part of the PVN (g) and the DMH (h) in 
MPOA targeted ablation compared to control. Mean + s.e.m., t-test, P > 0.1. 
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Extended Data Figure 3 | Females with MPOA Gal" cell ablation compared 
to Gal-Cre* controls injected with AAV-Flex-GFP. a, Behaviour of MPOA 
Gal* cell ablated virgin females with over 50% ablation efficiency (n = 15) 
compared to Gal-Cre* controls injected with AAV-Flex-GFP (n = 13). 
Chi-square test, P< 0.05. b, Percentage of pups retrieved by Gal™ cell ablated 
virgin females as a function of time compared to the controls. The retrieving 


(13) 


wi 
fo) 
J 


B 
ro) 
L 


Percentage of pups retrieved go 
3 
| 


AAV-GFP+Gal-Cre* (13) 


20 5 
10-7 
0 Ablation (15) 
0 500 1,000 1,500 2,000 
Elapsed time (s) 
d Pup grooming e Nest building 
1,000 + a 150 —— 
e 
800 - e e 
= ‘ 100 
fe) 
600 + = 
Ge 
400 - eee 
@ 9 50 5, O 
0 eecceee e ee ett 
Ablation =AAV-GFP+ Ablation = AAV-GFP+ 
(15) Gal-Cre* (15) Gal-Cre* 


(13) 


(13) 


©2014 Macmillan Publishers Limited. All rights reserved 


Maternal interaction 


= 


@000e00 e 
Ablation =AAV-GFP+ 
(15) Gal-Cre* 
(13) 


data of the two pups in each test are combined. Kolmogorov-Smirnov test, 
P<0.05. c-f, Crouching (c), pup grooming (d), nest building (e) and maternal 
interaction (f) in the Gal* cell ablated virgin females and control. 

Mean = s.e.m. Mann-Whitney test, *P < 0.05, **P < 0.01, ***P < 0.001. The 
control females with the longest crouching and of nest building duration are 
different individuals. 
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Extended Data Figure 4 | Deficits in retrieving behaviour of mothers with  *P < 0.05. c, Percentage of pups retrieved by the ablation group as a function of 
MPOA Gal" cell ablation. a, Behaviour of MPOA Gal* cell ablated mothers _ time compared to the controls. The retrieving data of the four pups in each test 
(n = 8) compared to controls (n = 8). Fisher’s exact test, P< 0.05. b, Number are combined. Kolmogorov-Smirnov test, P< 0.001. 

of pups retrieved by each mother. Mean + s.e.m. Mann-Whitney test, 
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Extended Data Figure 5 | Mating, inter-male aggression and locomotor 
activity of MPOA Gal’ cell ablated fathers. a-c, Locomotor behaviour of 
MPOA Gal” cell ablated and control fathers in a 5 min test in an open arena, 
measuring the distance moved (a), time spent in the centre zone (b) and the 
average velocity (c). Mean + s.e.m., t-test, P > 0.3. d-f, Inter-male aggression of 
MPOA Gal" cell ablated and control fathers, measuring duration of attack 
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(d), latency to attack (e) and duration of grooming the intruder (f). 

Mean ~ s.e.m. Mann-Whitney test, P > 0.2. g-i, Duration of mounting 

(g), latency to mount (h) and duration of mounting with pelvic thrust (i) of 
MPOA Gal" cell ablated fathers compared to controls. Mean + s.e.m. 
Mann-Whitney test, *P < 0.05. 
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Extended Data Figure 6 | Parenting, mating and inter-male aggression of 
MPOA Th" cell ablated fathers. a, Th mRNA expression in the MPOA of 
Th* cell ablated and control fathers. b, Number of MPOA Th‘ cells in ablation 
group compared to controls. Mean + s.e.m., t-test, ***P < 0.001. c, Number of 
AVPe Th" cells in MPOA targeted ablation. Mean + s.e.m., f-test, P = 0.07. 
d, The number of MPOA Th’ cell loss compared to the Gal* cell ablation 
experiments. One male had a failed Th” cell ablation and was removed from 
the data set hereafter. The Th cell loss is ~87% of the Gal" cell loss. 

e, Behaviour type of MPOA Th" cell ablated fathers compared to controls. 
Fisher’s exact test, P > 0.6. f, Combined percentage of pups (out of two) 
retrieved by the Th” cell ablation group as a function of time compared to the 


controls. Kolmogorov-Smirnov test, P > 0.9. g-i, Crouching (g), pup 
grooming (h) and nest building (i) in the Th™ cell ablated fathers and control. 
Mean + s.e.m. Mann-Whitney test, P > 0.2. The control male with the longest 
pup grooming also has the longest nest building activity, but not the longest 
duration of crouching. j-l, Duration of mounting (j), latency to mount (k) and 
duration of mounting with pelvic thrust (1) of MPOA Th’ cell ablated males 
compared to control in a mating assay. Mean + s.e.m. Mann-Whitney test, 
P> 0.3. m-o, Duration of attack (m), latency to attack (n) and duration of 
grooming the intruder (0) in MPOA Th’ cell ablated males compared to 
control in an inter-male aggression assay. Mean ~ s.e.m. Mann-Whitney test, 
P>0.3. 
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Extended Data Figure 7 | Behaviour raster plot of Gal::ChR2 and control grouped by experiment conditions and sorted by trial length. Roman numerals 
virgin males with and without light illumination. Each row represents a indicate the sample trials shown in Fig. 5f. Various elements of the behaviour 
single trial lasting for 5 min or until the male attacked the pup. Trials are are colour coded and labelled in the insert. 
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Extended Data Figure 8 | Behaviour raster plot of mated Gal::ChR2 and indicate the sample trials shown in Fig. 5i. Various elements of the behaviour 
control males with and without light illumination. Each row represents a _ are colour coded and labelled in the insert. 
10-min trial. Trials are grouped by experiment conditions. Roman numerals 


©2014 Macmillan Publishers Limited. All rights reserved 


Virgin males 


ARTICLE 


a Mounting b Latency to mount c Pelvic thrust 
NS NS NS 
250 - 1,000 250:4 
200 + 800 200 4 
‘ 1504 600 150 4 
2 
5 100 400 100 4 
a 
50 4 200 50 4 
0 0 0 
Activation Control Activation Control Activation Control 
(7) (7) (7) (7) (7) (7) 
d Attack e Latency to attack f Grooming intruder 
P=0.06 * NS 
200 4 
1,000 150 
150 + 800 
aa) 100 4 
e 600 
£ 1004 
oO 
= 400 
Ps 50 7 
507 200 
0 0) 
Activation Control Activation Control Activation Control 
(7) (7) (7) (7) (7) (7) 
g Distance h Sniffing in ! Sniffing in 
moved mating test aggression test 
KEK NS NS 
———1 1 -——1 
8,000 5 800 800 
= 6,000 4 600 600 4 
€ F a 
5 aa 2 
re 5 6 
2 4,000 5 = 400 s 400- 
a 2 i 
a = 5 
Aa ja) ja) 
2,000 4 200 200 7 
0 0 0 
Activation Control Activation Control Activation Control 
(7) (7) (7) (7) (7) (7) 
j Stimulation time in total trial length k Viral infection 
NS NS 
50 u | 100 .— al 
On BB Galt cells expressing FP 
40 ae 80 U0 Galt/c-fos* cells expressing FP 
g es e 4 7 & 
2 30 —— e. e -esceene- gs 60 
cS e —e —— iS 
g e ee ‘ gu 
5 20 Ce e e a 40 
= e a 
10 e 20 
0) 
Parenting Mating Aggression Locomotor AAV5- AAV8- 
(15) (7) (7) (7) ChR2YFP GFP 
(2) (2) 


Extended Data Figure 9 | Mating, inter-male aggression and locomotor 
activity of virgin males with MPOA Gal" cell activation and controls of light 
stimulation and viral infection. a-c, Duration of mounting (a), latency to 
mount (b) and duration of mounting with pelvic thrust (c) in virgin males with 
Gal* cell activation compared to controls in a mating assay. Paired t-test, 

P> 0.7. d-f, Duration of attack (d), latency to attack (e) and duration of 
grooming the intruder (f) in virgin males with Gal* cell activation compared to 
controls in an inter-male aggression assay. Paired t-test, *P < 0.05, NS. not 
significant. g, Distance moved in virgin males with Gal” cell activation 


compared to controls. Paired t-test, ***P < 0.001. h, i, Time spent sniffing the 
intruder in mating (h) and inter-male aggression (i) assay. Paired t-test, P > 0.6. 
j, The duration of light stimulation in each behaviour test as a percentage of 
the total trial length. Mean + s.e.m., one-way ANOVA, P > 0.6. k, The 
percentages of Gal* and Gal" /c-fos* cells co-expressing fluorescent protein, 
in females injected with AAV5-Flex-ChR2-EYFP or AAV8-Flex-GFP after 
maternal interaction with pups. Mean + s.e.m., two-way ANOVA examining 
the differences in the infection of the two viruses and the two cell populations, 
P> 0.2 for both factors and the interaction between them. 
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Space-time wiring specificity supports 
direction selectivity in the retina 


Jinseop S. Kim', Matthew J. Greene!, Aleksandar Zlateski?, Kisuk Lee’, Mark Richardson’}, Srinivas C. Turaga't, 
Michael Purcaro', Matthew Balkam!, Amy Robinson’, Bardia F. Behabadi*, Michael Campos’, Winfried Denk‘, 


H. Sebastian Seung't & the EyeWirers® 


How does the mammalian retina detect motion? This classic problem in visual neuroscience has remained unsolved for 
50 years. In search of clues, here we reconstruct Off-type starburst amacrine cells (SACs) and bipolar cells (BCs) in serial 
electron microscopic images with help from EyeWire, an online community of ‘citizen neuroscientists’. On the basis of 
quantitative analyses of contact area and branch depth in the retina, we find evidence that one BC type prefers to wire 
with a SAC dendrite near the SAC soma, whereas another BC type prefers to wire far from the soma. The near type is 
known to lag the far type in time of visual response. A mathematical model shows how such ‘space-time wiring 
specificity’ could endow SAC dendrites with receptive fields that are oriented in space-time and therefore respond 
selectively to stimuli that move in the outward direction from the soma. 


Compared to cognitive functions such as language, the visual detection 
of motion may seem trivial, yet the underlying neural mechanisms have 
remained elusive for half a century'*. Some retinal outputs (ganglion 
cells) respond selectively to visual stimuli moving in particular direc- 
tions, whereas retinal inputs (photoreceptors) lack direction selectivity 
(DS). How does DS emerge from the microcircuitry connecting inputs 
to outputs? 

Research on this question has converged upon the SAC (Fig. 1a, b). 
A SAC dendrite is more strongly activated by motion outward from 
the cell body to the tip of the dendrite, than by motion in the opposite 
direction’. Therefore a SAC dendrite exhibits DS, and outward motion 
is said to be its ‘preferred direction’. Note that it is incorrect to assign 
a single such direction to a SAC, because each of the cell’s dendrites 
has its own preferred direction (Fig. 1a). DS persists after blocking 
inhibitory synaptic transmission’, when the only remaining inputs to 
SACs are BCs, which are excitatory. As the SAC exhibits DS but its BC 
inputs exhibit little or none’, DS appears to emerge from the BC-SAC 
circuit. 

Mouse BCs have been classified into multiple types®, with different 
time lags in visual response’*®. Motion is a spatiotemporal phenomenon: 
an object at one location appears somewhere else after a time delay. 
Accordingly, DS might arise because different locations on the SAC 
dendrite are wired to BC types with different time lags. More specif- 
ically, we propose that the proximal BCs (wired near the SAC soma) lag 
the distal BCs (wired far from the soma). 

Such ‘space-time wiring specificity’ could lead to DS as follows (Fig. 1c). 
Motion outward from the soma will activate the proximal BCs followed 
by the distal BCs. If the stimulus speed is appropriate for the time lag, 
signals from both BC groups will reach the SAC dendrite simultaneously, 
summing to produce a large depolarization. For motion inward towards 
the soma, BC signals will reach the SAC dendrite asynchronously, caus- 
ing only small depolarizations. Therefore the dendrite will ‘prefer’ out- 
ward motion, as observed experimentally’. 


Three-dimensional reconstruction by crowd and 
machine 


We tested our hypothesis by reconstructing Off BC-SAC circuitry using 
2198, an existing data set of mouse retinal images from serial block-face 
scanning electron microscopy (SBEM)’. The e2198 data set was over- 
segmented by an artificial intelligence into groups of neighbouring voxels 
that were subsets of individual neurons. These ‘supervoxels’ were assem- 
bled by humans into accurate three-dimensional (3D) reconstructions of 
neurons. For this activity, we hired and trained a small number of work- 
ers in the laboratory, and also transformed work into play by mobilizing 
volunteers through EyeWire, a website that turns 3D reconstruction of 
neurons into a game of colouring serial electron microscopy images. 

Through EyeWire, we wanted to enable anyone, anywhere, to par- 
ticipate in our research. The approach is potentially scalable to extremely 
large numbers of ‘citizen scientists’. More importantly, the 3D recon- 
struction of neurons requires highly developed visuospatial abilities, and 
we wondered whether a game could be more effective" than traditional 
methods of recruiting and creating experts. 

In gameplay mode, EyeWire shows a 2D slice through a ‘cube’, an 
€2198 subvolume of 256 X 256 X 256 greyscale voxels (Fig. 2a). Gameplay 
consists of two activities: colouring the image near a location, or searching 
for a new location to colour. Colouring is done by clicking at any location 
in the 2D slice, which causes the supervoxel containing that location to 
turn blue. Searching is done by translating and orienting the slice within 
the cube, and interacting with a 3D rendering of the coloured supervoxels. 

When the player first receives a cube, it already comes with a ‘seed’, a 
contiguous set of coloured supervoxels. The challenge is to colour all the 
rest of the supervoxels that belong to the same neuron, and avoid colour- 
ing other neurons. Gameplay for a cube terminates when the player clicks 
‘submit’, receives a numerical score (Extended Data Fig. 1a), and pro- 
ceeds to the next cube. Because our artificial intelligence is sufficiently 
accurate, colouring supervoxels is faster than manually colouring voxels, 
an older approach to 3D reconstruction”. 
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Heidelberg, Germany. °https://eyewire.org. +Present addresses: 601 N 42nd Street, Seattle, Washington 98103, USA (M.R.); Princeton Neuroscience Institute and Computer Science Deptartment, 
Princeton, New Jersey 08544, USA (H.S.S.); Gatsby Computational Neuroscience Unit, London WC1N 3AR, UK (S.C.T.). 
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The scoring system is designed to reward accurate colouring. This is 
nontrivial because EyeWire does not know the correct colouring. Each 
cube is assigned to multiple players (typically 5 to 10), and high scores 
are earned by players who colour supervoxels that other players also 
colour. In other words, the scoring system rewards agreement between 
players, which tends to be the same as rewarding accuracy. 

Consensus is used not only to incentivize individual players, but also to 
enhance the accuracy of the entire system. Any player’s colouring is 
equivalent to a set of supervoxels. Given the colourings of multiple players 
starting from the same seed in the same cube, a consensus can be com- 
puted by voting on each supervoxel. EyeWirer consensus was much more 
accurate than any individual EyeWirer (Fig. 2b, c). 

Colouring a neuron is more challenging than it sounds. Images are 
corrupted by noise and other artefacts. Neurites take paths that are difficult 
to predict, and can branch without warning. Careless errors result from 
lapses in attention. Extensive practice is required to achieve accuracy. The 


Preferred direction 


Figure 1 | Starburst amacrine cell and its 
direction selectivity. a, b, Off SAC (red) viewed 
opposite (a) and perpendicular (b) to the light axis. 
GCL, ganglion cell layer. Greyscale images from the 
e2198 data set”. Swellings of distal dendrites are 
presynaptic boutons (inset). Scale bar, 50 jm. c, We 
propose that a SAC dendrite is wired to pathways 
with time lags of visual response that differ by an 
amount. d, A previous model invoked the time lag 
due to signal conduction in a passive dendrite”. 

e, The previous model predicts an inward preferred 
direction for the somatic voltage, contrary to 
empirical observations’. 


————> 


most accurate EyeWirers (Fig. 2c, top right corner) often had experience 
with thousands of cubes. Improvements in accuracy were observed over 
the course of hundreds of cubes, corresponding to tens of hours of practice 
(Fig. 2d). According to subjective reports of EyeWirers, learning continues 
for much longer than that. By contrast, previous successes at “crowdsour- 
cing’ image analysis involved tasks that did not require such extensive 
training’””*. 

Reconstructing an entire neuron requires tracing its branches through 
thousands of cubes. This process is coordinated by an automatic spaw- 
ner, which inspects each consensus cube for branches that exit the cube. 
Each exit generates a new cube and seed, which are added to a queue. Eye- 
Wirers are automatically assigned to cubes by an algorithm that attempts 
to balance the number of plays for each cube. 

Over 100,000 registered EyeWirers have been recruited by news 
reports, social media and the EyeWire blog. Players span a broad range 
of ages and educational levels, come from over 130 countries, and the 


Figure 2 | EyeWire combines crowd and 
artificial intelligence. a, 3D and 2D views in the 
neuron reconstruction game. b, Precision and 
recall are two measures of accuracy. c, Accuracy of 
artificial intelligence (AI), 5,881 EyeWirers, and 
EyeWirer consensus on reconstruction of a 
ganglion cell. d, EyeWirer precision and recall 
increase with number of cubes submitted. Solid 
lines are median values across 208 EyeWirers who 
submitted at least 500 cubes, and shaded regions 
indicate 25th to 75th percentile. 
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great majority have no formal training in neuroscience (Extended Data 
Figs 2 and 3 and Supplementary Notes). These statistics show that Eye- 
Wire indeed widens participation in neuroscience research. At the same 
time, the most avid players constitute an elite group with disproportionate 
achievements. For example, the top 100 players have contributed about 
half of all cubes completed in EyeWire. 

Laboratory workers also reconstructed neurons independently of Eye- 
Wire, with a more sophisticated version of the user interface (Methods). 
Their reconstructions were pooled with those of EyeWirers for the ana- 
lyses reported below. Reconstruction error was quantified (Methods), 
and was treated like other kinds of experimental error when calculating 
confidence intervals from our data. 


Contact analysis 
We reconstructed 195 Off BC axons and 79 Off SACs from e2198 
(Fig. 3b and Extended Data Fig. 4). The e2198 retina was stained in an 
unconventional way that did not mark intracellular structures such as 
neurotransmitter vesicles’, and reliable morphological criteria for iden- 
tification of BC presynaptic terminals are unknown. As an indirect mea- 
sure of connectivity, contact areas were computed for all BC-SAC pairs. 
The resulting ‘contact matrix’ was analysed through two subsequent steps. 
In the first step, Off BC axons were classified into five cell types, fol- 
lowing structural criteria’* established to correspond with previous mo- 
lecular definitions® (Methods and Extended Data Fig. 5). BC types stratify 
at characteristic depths in the inner plexiform layer (IPL), and vary in size 
(Fig. 4a). The BCs of each type formed a ‘mosaic’, meaning that cells were 
spaced roughly periodically (Extended Data Fig. 6a—e). This is generally 
accepted as an important defining property of a retinal cell type. Type 
densities (Extended Data Fig. 6f) were roughly consistent with previous 
reports®. When the columns of the contact matrix were sorted by BC type 
(Fig. 4b), it became evident that BC2 and BC3a contact SACs more than 
other BC types. 


Figure 3 | 3D reconstructions of Off BCs and SACs. a, b, Cells viewed 
opposite the light axis. BCs alone (a); BCs with SACs (b). Scale bar, 50 tum. 
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In the second step, we averaged contact area over BC-SAC pairs of the 
same BC type and similar distance between the BC axon and the SAC soma 
in the plane tangential to the retina (Fig. 4c). These absolute areas were 
normalized to convert them into the percentage of SAC surface area cov- 
ered by BCs ofa given type (Methods). The resulting graphs show that BC2 
prefers to contact SAC dendrites close to the SAC soma, whereas BC3a 
prefers to contact far from the soma (Fig. 4d and Extended Data Fig. 7c). 

Imaging of intracellular calcium in BC axons’ and extracellular glu- 
tamate around BC axons’ indicates that BC2 lags BC3a in visual responses 
by 50-100 ms. Therefore BC-SAC wiring appears to possess the space-time 
specificity appropriate for an outward preferred direction, as we proposed 
(Fig. 1c). 


Co-stratification analysis 

Off SACs stratify at a particular depth in the IPL (Fig. 1b). Why this depth 
and not some other? From Fig. 4a, it is obvious that this depth is appro- 
priate for wiring with BC2 and BC3a, as required by our model of DS 
emergence. Following this logic one step further, we wondered whether 
the observed dependence of contact on distance from the SAC soma 
might be reflected in fine aspects of SAC morphology. We hypothesized 
that SAC dendrites are ‘tilted’, moving deeper into the IPL with distance 
from the SAC soma. Such a change in depth would be compatible with 
more overlap with BC2 near the soma, and more overlap with BC3a 
far from the soma, as BC3a is deeper in the IPL than BC2 (Fig. 4a and 
Supplementary Video 1). 

The hypothesized tilt turns out to exist (Fig. 5a). Very close to the SAC 
soma, the dendrites dive sharply into the IPL from the inner nuclear layer 
(INL). Surprisingly, IPL depth continues to increase as distance from the 
SAC soma in the tangential plane ranges from 20 to 80 pm. The slight 
increase is not evident in a single dendrite (Fig. 1b), but emerges from 
statistical averaging. 

Could dendritic tilt be the cause of the observed variation in BC- 
SAC contact with distance (Fig. 4d)? We cannot address causality on 
the basis of our data, but we can test how well the tilt predicts contact 
variation. We computed the stratification profiles of BC types (Fig. 5a), 
defined as the one-dimensional density of BC surface area along the 
depth of the IPL. We also computed the stratification profile of SAC 
dendrites at various distances from the SAC soma (quartiles, Fig. 5a). 
Assuming that BC and SAC arborizations are statistically independent 
of each other, we estimated contact from ‘co-stratification’, defined as 
the integral over IPL depth of the product of BC and SAC stratification 
profiles (Methods). 

We found that actual BC2 contact depends more strongly on distance 
than predicted; the slight change in IPL depth after the initial plunge 
appears too small to account for the large change in actual BC2 contact. 
In other failures of contact prediction, BC3a, BC3b and BC4 stratify at 
the same IPL depths (Fig. 5a), yet BC3a makes much more contact than 
BC3b or BC4. Also, actual BC3a contact plummets near the tips of SAC 
dendrites (Fig. 4d), whereas predicted contact does not change at all 
because the IPL depth of SAC dendrites is constant in this region (Fig. 5b). 
Overall, the total contact from all BC types seems low in this region 
(Extended Data Fig. 7d), suggesting that BCs avoid making synaptic 
inputs to the most distal SAC dendrites. This runs counter to the con- 
ventional belief that input synapses are uniformly distributed over the 
entire length of SAC dendrites’. The unreliability of inferring contact 
from co-stratification is illustrated by numerous examples of SAC den- 
drites that pass through BC axonal arborizations without making any 
contact at all (Extended Data Fig. 8). 


Model of the BC-SAC circuit 


We mentioned previously that BC2 lags BC3a in visual response. There is 
another important difference: BC3a responds more transiently to step 
changes in illumination, whereas BC2 exhibits more sustained responses. 
The implications of the sustained-transient distinction for DS can be 
understood using a mathematical model. The activity of a retinal neuron 
is often approximated as a linear spatiotemporal filtering of the visual 
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Figure 4 | BC-SAC contact. a, Off BCs were 
divided into five types®“* on the basis of IPL depth 
and size. Scale bar, 10 um. b, Contact areas of BC- 
SAC pairs, sorted by BC types. c, Pairs were further 
sorted by the distance of the BC axon from the SAC 
soma, as measured in the tangential plane. Scale 
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distance, normalized to percentage of SAC surface 
c d — BC1 area at that distance (Extended Data Fig. 3b). 
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stimulus followed by a nonlinearity'®””. Such a ‘linear-nonlinear’ model 
for the output O(t) of the SAC dendrite can be written as 


ai 
O(t)= | dxdt' W(x,t —t')I(x,t’) (1) 


For simplicity, the dendrite and visual stimulus I(x,f) are restricted to a 
single spatial dimension x, and the nonlinearity is a half-wave rectification, 
[z]" = max{z,0}. We interpret the integral in equation (1) as the summed 
input from the BCs presynaptic to the SAC. The nonlinearity could arise 
from various biophysical mechanisms, such as synaptic transmission from 
SACs to other neurons. The spatiotemporal filter W(x,t) is a sum of two 
functions, 


W(x,t) = U,(x)v.(t) + Ur(x)i(t) (2) 


corresponding to contributions from BC2 and BC3a. The sustained 
temporal filter v,(t) is monophasic, whereas the transient filter v,(t) is 
biphasic (Fig. 6a). The spatial filter U,(x) represents the entire set of all 
BC2 inputs to the dendrite, and can be estimated from the BC2 contact 
area graph in Fig. 4d. Similarly, U,(x) can be estimated from the BC3a 
contact area graph. The two spatial filters are displaced relative to each 
other (Fig. 6a), because BC3a tends to contact SAC dendrites at more 
distal locations than BC2. 


a Bipolar surface area (um?) b 
e) 200 400 600 800 1,000 


25-75th percentiles 
—— Median depth 


— BC1 
— BC2 
— BC3a 
— BC3b 
—Bc4 


IPL depth (%) 
SAC surface area (%) 


0.5 


700 120 140 
Distance from SAC soma (uum) 


0 20 40 60 80 100 120 140 0 20 40 60 80 
Distance from SAC soma (um) 


Figure 5 | BC-SAC co-stratification. a, SAC dendrites move deeper into the 
IPL (median depth, red line) with increasing distance from the SAC soma in the 
tangential plane. Stratification profiles of BC types, defined as density of surface 
area over the depth of the IPL. b, Co-stratification predictions of BC-SAC 
contact area versus distance from the SAC soma. The curves are normalized by 
SAC area at each distance, and are therefore directly comparable with those of 
Fig. 4d. 
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Each of the terms in the sum of equation (2) is said to be ‘space-time 
separable’, because it is the product of a function of space and a function 
of time. It was previously observed that a spatiotemporal filter W(x,t) of 
this form can endow a model like equation (1) with DS’*””. This is illu- 
strated by Fig. 6 using the fact that the convolution in equation (1) is 
equivalent to ‘sliding’ the spatiotemporal filter W in time over the stimu- 
lus J, and computing the overlap at each time. The filter W(x,f) is oriented 
in space-time (Fig. 6a), and so also is a moving stimulus I(x,t) (Fig. 6g, h). 
The overlap with a rightward-moving stimulus (Fig. 6h) is greater than 
for a leftward one (Fig. 6g), so the model exhibits DS with a rightward 
preferred direction. 

How is DS affected by the biphasic shape of the transient temporal 
filter, v,(t)? Ifwe remove the negative lobe (Fig. 6c), then v,(t) will become 
monophasic like y,(¢) and their relation closer to a simple time lag 
(Fig. 6d). We will refer to this model as a ‘Reichardt detector’, in honour 
of the pioneering researcher Werner Reichardt, although it more closely 
resembles a subunit of his model”’. On the other hand, removing the 
positive lobe of v,(t) makes it monophasic but with inverted sign relative 
to the sustained filter (Fig. 6e). The result (Fig. 6f) resembles a DS model 
originally proposed by Barlow and Levick”’. 

Both modified models (Fig. 6d, f) exhibit DS. In the Reichardt detector, 
the inputs from the two arms enhance each other for motion in the 
preferred direction. In the Barlow-Levick detector, the two inputs cancel 
each other for motion in the null direction. As our sustained-transient 
model (Fig. 6b) uses both mechanisms, it should exhibit more DS than 
either detector. Our model is related to versions of the Reichardt detector 
with low-pass and high-pass filters on the two arms”. 

In the original Barlow-Levick model, the negative filter corresponded 
to synaptic inhibition. As BCs are believed to be excitatory, negative BC 
input in our model represents a reduction of excitation relative to the 
resting level, rather than true inhibition. Signalling by reduced excitation 
may be possible, at least for low-contrast stimuli, as BC ribbon synapses 
may have a significant resting rate of transmitter release”. 

The model of equations (1) and (2) is a useful starting point for many 
theoretical investigations that are outside the scope of this article. For 
example, DS dependency on the spatial and temporal frequencies of a si- 
nusoidal travelling wave stimulus is calculated in Supplementary Equations, 
and DS dependence on stimulus speed is graphed in Extended Data Fig. 9. 


Discussion 


In our DS model, SAC dendrites are wired to BC types with different time 
lags. A previous model did not distinguish between BC types, and instead 
relied on the time lag of signal conduction within the SAC dendrite itself* 
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Figure 6 | Mathematical model of the BC-SAC circuit. a, Spatiotemporal 
filter of equation (2). Green is positive, red is negative, and grey is zero. b, The 
transient pathway effectively combines a positive channel that leads the 
sustained pathway by t and a negative channel that lags by t. c, Removing the 
negative channel yields a Reichardt detector (d). e, Removing the positive 
channel yields a Barlow-Levick detector (f). A moving visual stimulus I(x,f) is 
oriented in space-time (g, h), and so are the spatiotemporal filters (a, ¢, e). 


(Fig. 1d). Like most other amacrine cells, SACs lack an axon; their output 
synapses are found in the distal zones of their dendrites’* (Fig. 1a, inset). 
Owing to dendritic conduction delay, proximal BC inputs should take 
longer to reach the output synapses than distal BC inputs (Fig. 1d). There- 
fore this time lag is also consistent with the empirical finding of an 
outward preferred direction. To summarize the novelty of our hypothesis, 
we place the time lag before BC-SAC synapses, whereas the previous 
model places it after BC-SAC synapses. 

The postsynaptic delay model has a major weakness. If dendritic con- 
duction were the only source of time lag, the somatic voltage would exhibit 
DS with an inward preferred direction, but this is inconsistent with intra- 
cellular recordings’ (Fig. le). By contrast, the presynaptic delay model is 
compatible with approximating a SAC dendrite as isopotential (Fig. 1c), 
so preferred direction is predicted to be independent of the location of 
the voltage measurement, consistent with empirical data’. It may also 
be possible to make the postsynaptic delay model consistent with ex- 
periments by adding active dendritic conductances*. 

The presynaptic and postsynaptic delay models are not mutually ex- 
clusive. If they work together, passive cable theory suggests that pre- 
synaptic delay dominates, because estimated postsynaptic delay is much 
shorter than the time lag between BC2 and BC3a (Supplementary Equa- 
tions). Can we gauge the relative importance of the delays empirically 
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rather than theoretically? One way would be intracellular recording at the 
SAC soma of responses to visual stimulation at various dendritic loca- 
tions. If postsynaptic delay dominates, then response latency will grow 
with distance of the visual stimulus from the soma. If presynaptic delay 
dominates, then distal stimulation will evoke somatic responses with 
shorter latency than proximal stimulation. This prediction may seem 
counterintuitive, but is an obvious outcome of our model. 

Many other models of DS emergence in SACs invoke inhibition as 
well as excitation””-**. We have focused on excitatory mechanisms, as 
blocking inhibition does not abolish DS’. However, inhibition may have 
the effect of enhancing DS, and its role should be investigated further. 

This work focused on Off BC-SAC circuitry. An analogous sustained- 
transient distinction can also be made for On BC types”®. It remains to be 
seen whether their connectivity with On SACs depends on distance from 
the soma. If this turns out to be the case, then the model of Fig. 6 could 
serve as a general theory of motion detection by both On and Off SACs. 
The model filter of Fig. 6a also resembles the spatiotemporal receptive 
field of the J type of ganglion cell (see Fig. 3b of ref. 29). 

Neural activity imaging” and connectomic analysis”' have recently 
identified a plausible candidate for the site of DS emergence in the fly 
visual system. If our theory is correct, then the analogies between insect 
and mammalian motion detection’ are more far-reaching than prev- 
iously suspected, with fly T4 and T5 cells corresponding to On and Off 
SAC dendrites in both connectivity and function. 

A glimmer of space-time wiring specificity can even be seen in the 
structure of the SAC itself. As BC types with different time lags arborize 
at different IPL depths, IPL depth can be regarded as a time axis. There- 
fore, the slight tilt of the SAC dendrites in the IPL (Fig. 5a) could be related 
to the orientation of the SAC receptive field in space-time (Fig. 6a). 
However, dendritic tilt alone is not sufficient to predict our model, as 
co-stratification sometimes fails to predict contact (Figs 4d and 5b). For 
example, co-stratification predicts strong BC4 connectivity to distal SAC 
dendrites. This would favour an inward preferred direction, contrary to 
what is observed, because BC2 leads (not lags) BC4 in visual responses’. 

The idea that contact (or connectivity) can be inferred from co-stratification 
is sometimes known as Peters’ rule®’, and has also been applied to estim- 
ate neocortical connectivity***. The present work shows that fairly subtle 
violations of Peters’ rule may be important for visual function. Previous 
research suggests that On-Off direction-selective ganglion cells inherit 
their DS from SAC inputs owing to a strong violation of Peters’ rule’. 

Our findings were made possible by using artificial intelligence to reduce 
the amount of human effort required for 3D reconstruction of neurons. 
Even after the labour savings, our research required great human effort 
from a handful of paid workers in the laboratory and a large number of 
volunteers through EyeWire. Our experiences do not support claims that 
the ‘wisdom of the crowd’ should replace experts”. Instead, EyeWire 
depends on cooperation between laboratory experts and online amateurs 
(Methods). Furthermore, some amateurs developed remarkable expertise 
and were promoted to increasingly sophisticated roles within the Eye- 
Wire community (Supplementary Notes). We believe that crowd wisdom 
requires amplifying the expert voices within the crowd, and also empow- 
ering individuals to become experts. Fortunately, such goals are well- 
matched to the game format. 

The EyeWire artificial intelligence was based on a deep convolutional 
network***", Similar networks have been successfully applied to serial 
electron microscopy images obtained using conventional staining tech- 
niques that mark intracellular organelles”. Extending EyeWire to such 
images, in which synapses are clearly visible, would enable a true connection 
analysis that goes beyond the contact and co-stratification analyses 
used here. 

Our work demonstrates that reconstructing a neural circuit can pro- 
vide surprising insights into its function. Much more will be learned as 
reconstruction speed grows. The combination of crowd and artificial in- 
telligence promises a continuous upward path of improvement, as human 
input from the crowd is not only useful for generating neuroscience 
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discoveries, but also for making the artificial intelligence more capable 
through machine learning. 

Note added in proof: Further evidence that BC axons exhibit little or 
no DS appeared while this paper was in press”. 


METHODS SUMMARY 


A convolutional network was trained to detect neural boundaries via the MALIS 
procedure*’ and CNPKG (https://github.com/srinituraga/cnpkg/), which is based 
on Cortical Network Simulator“. The convolutional network was applied to the 
e2198 data set, which was then segmented into supervoxels by a modified version 
of the watershed algorithm. Paid workers and volunteer EyeWirers reconstructed 
neurons in 3D by assembling supervoxels. The retina was computationally flat- 
tened, reconstructed neurons were classified by their structural properties, and 
contact and co-stratification were analysed by custom Matlab and C+ + code. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
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METHODS 


We worked with the e2198 data set” rather than the e2006 data set’* because e2198 
is large enough to encompass entire SAC dendrites (~150 jum). All dimensions are 
uncorrected for tissue shrinkage, which was previously estimated at 14% by com- 
parison of two-photon and serial electron microscopy images". 

Machine learning. The boundaries between neurons in subvolumes of the e2198 
and e2006 data sets were manually traced. Using this as ground truth, a convolu- 
tional network was trained to detect boundaries between neurons using the MALIS 
method*. The convolutional network had the same architecture as one used 
previously", and produced as output an affinity graph connecting nearest neigh- 
bour voxels“. Any subvolume of €2198 could be oversegmented by applying a 
modified watershed algorithm to the appropriate subgraph. The regions of the 
oversegmentation are called supervoxels. 

Reconstruction by workers. A team of part-time workers, numbering about half 
a dozen at any given time, reconstructed neurons using a more sophisticated version 
of the EyeWire interface. Workers were hired on the basis of an interview and a test 
of software use passed by three-quarters of the applicants. They were trained for 
40-50 h before generating reconstructions used for research. Their skills typically 
improved for months or even years after the initial training period, and were superior 
to those of professional neuroscientists without reconstruction experience. 

As with EyeWire, the task of reconstructing an entire neuron was divided into 
subtasks, each of which involved reconstructing the neuron within a subvolume 
starting from a supervoxel ‘seed’. However, the subvolumes were roughly 100 
times larger than EyeWire cubes, and only two workers were assigned to each 
subvolume. 

In the first stage of error correction, disagreements were detected by computer, 
and resolved by one of the two workers, or a third worker. The third occasionally 
detected and corrected errors that were not disagreements between the first two. 
Most disagreements were the result of careless errors, and were easily resolved. 
More rarely, there were disagreements caused by fundamental ambiguities in the 
image. These locations were noted for later examination in a further stage of error 
correction. 

This second stage relied on 3D reconstructions of entire neurons assembled 
from multiple subvolumes and inspected by one of the authors (J.S.K.). Suspicious 
branches or terminations, as well as overlaps between reconstructions of different 
neurons were detected. The original image was re-examined at these locations to 
check for errors. The process was repeated until no further errors could be detected. 

The precision of our final reconstruction relative to the truth is probably com- 

parable to the precision of the penultimate reconstruction relative to the final 
reconstruction, 0.99 for SACs and 0.96 for BCs. Recall is probably somewhat 
poorer, because missing branches are more difficult to detect than superfluous 
branches. Recall must be reasonably good for SACs, as missing branches would be 
detected by deviations from the typical SAC shape and radius. 
Reconstruction by EyeWirers. Some reconstruction errors slip past the con- 
sensus mechanism. These are detected through visual inspection of an ‘overview’ 
mode, which displays 3D renderings of entire neurons currently under recon- 
struction (Extended Data Fig. 1b). False branches become obvious once they are 
long enough, and are reported by EyeWirers through chat or email. They are chopped 
off by GrimReaper, a special EyeWirer played by laboratory experts endowed with the 
superpower of overruling the consensus. GrimReaper also extends branches that have 
terminated prematurely. Correction by GrimReaper is similar to the second stage of 
error correction described above, so the final reconstruction presumably has similar 
accuracy. 

SAC reconstructions are extremely difficult for two reasons: (1) SAC dendrites 
are very thin and may falsely appear to terminate, owing to limited spatial reso- 
lution and imperfect staining; and (2) the interiors of many SAC boutons con- 
tained irregular darkenings, which could falsely appear like cellular boundaries. 
(The reason for the darkening is unclear, as the extracellular staining procedure 
was not intended to mark intracellular structures.) 

Novices tend to prematurely terminate SAC dendrites. Experts know that most 
cubes do not contain termination points, and therefore try harder to find continua- 
tions, using a variety of sophisticated search strategies. GrimReaper is also allowed to 
view how the cube fits into the entire reconstructed neuron. This additional spatial 
context can be used to disambiguate difficult cubes, given knowledge of the typical 
appearance of a SAC. 

Before learning in normal gameplay (Fig. 2d), all EyeWirers are required to go 
through a training session immediately after registering for the site. This consists 
of a sequence of tutorial cubes, each of which was previously coloured by an expert 
(Extended Data Fig. 1c). Each cube teaches through instructions and per-click 
feedback about accuracy based on comparing the EyeWirer’s selections with those 
of the expert. After submitting a tutorial cube, the EyeWirer is given a chance to view 
mistakes. 
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Accuracy is monitored on a weekly basis by computing the precision and recall 
of each EyeWirer with respect to the truth, defined as neuron reconstructions 
based on EyeWire consensus followed by GrimReaper corrections. Less accurate 
EyeWirers are given less weight in the vote. 

Players’ daily, weekly and monthly scores are publicly displayed on a leader- 
board (Extended Data Fig. 1b, right), motivating players to excel through com- 
petition. Players communicate with each other through online ‘chat’ (Extended 
Data Fig. 1b, left) and discussion forums. 

A ‘beta test’ version of EyeWire was deployed in February 2012 and attracted a 
small group of users, who helped guide software development. EyeWire officially 
launched in December 2012. 

Reconstruction of Off SACs. Off SACs were recognized by their somata in the 
INL, narrow IPL stratification at roughly one third of the depth from the INL to 
the ganglion cell layer, and characteristic ‘starburst’ appearance (Fig. 1a). 

Off SACs were reconstructed by: (1) forward tracing from the soma to dend- 
ritic tips; and (2) backward tracing from varicosities on candidate SAC dendrites 
to the soma. In the forward method, a candidate SAC soma was identified as a 
supervoxel with a characteristic pattern of dendritic stubs bearing spiny protru- 
sions. By the time reconstruction progressed to approximately half of the average 
SAC radius, an Off SAC could be conclusively recognized by its starburst shape 
and narrow stratification at the appropriate IPL depth. More than 90% of candi- 
dates turned out to be SACs. 

In the backward method, we located a thin dendrite with varicosities at the 
appropriate IPL depth. This was reconstructed back to the soma, and then the rest 
of the dendrites were reconstructed from the soma to the tips. The cell could be 
discarded at any point during this process, if its dendrites escaped from the 
appropriate IPL depth or failed to exhibit the proper morphological character- 
istics. Less than 25% of initial candidates ended up confirmed as SACs. 

In total, 79 Off SACs were reconstructed, 39 by forward tracing and 52 by 
backward tracing. This is more than half the entire population in e2198, judging 
from the published density**. After candidates were identified by one of the authors 
(J.S.K.), reconstructions were performed by laboratory workers (59 cells) or by 
EyeWirers (29 cells). Both pairs of numbers sum to more than 79, because the sets 
overlapped (12 for forward/backward, 9 for workers/EyeWirers). 

In March 2012, laboratory workers began reconstruction of SACs. In March 
2013, EyeWirers were invited to the ‘Starburst Challenge’, a sequence of tutorial 
cubes drawn from SACs. Those who passed with sufficient accuracy were an elite 
group allowed to reconstruct SACs (Supplementary Information). EyeWirers even- 
tually shouldered most of the burden of SAC reconstruction, with only 8% of SAC 
cubes needing correction by GrimReaper. This enabled laboratory workers to shift 
their focus to BCs, as described below. 

Reconstruction of Off BCs. The somata of Off BCs were generally outside e2198, 
which extended only partially into the INL (Fig. 1 of ref. 9). The trunks of candidate 
BC axons were located in the interstices of the INL, and followed into the IPL. Ifthe 
axons arborized in the Off region of the INL, they were fully reconstructed. Cells 
that violated known BC structures were identified as amacrine cells and discarded". 

BC axons were difficult to reconstruct owing to poor staining, and their highly 

irregular shapes. They could not be accurately reconstructed (either by online 
volunteers or laboratory experts) within the 256 X 256 X 256 cubes of EyeWire, 
which were too small to provide sufficient spatial context. Therefore BCs were 
reconstructed only by laboratory workers using the large subvolumes mentioned 
above. 
Coordinate system. For more precise quantification of structural properties, a 
new coordinate system was defined by applying a nonlinear transformation to 
neurons so as to flatten the IPL and make it perpendicular to one of the coordinate 
axes. The nonlinear transformation was found by the following steps. First a 
global planar approximation to the Off SAC surface was computed. Then the 
centroid of all the SACs was projected onto this global plane to define the origin of 
the coordinate system. The projection was along the coordinate axis of the e2198 
volume closest in direction to the light axis. 

To correct for curvature, an azimuthal equidistant projection*® of the Off SAC 
surface onto the global plane was made about the origin. Then local planar approx- 
imations to the SAC surface were computed in the neighbourhoods of every node 
in a triangular lattice. At each point in a triangle, the SAC surface was approxi- 
mated by computing the mean of the planar approximations (as quaternions with 
yaw constrained to be zero) for the triangle’s vertices, weighted by distance of the 
point from the vertices. 

The Off SACs were defined as 32% IPL depth. We also reconstructed a few On 
SACs, and defined them as 62%. These choices placed the edge of the INL at 0%. 
Structural properties of all cells were computed on the basis of locations of their 
surface voxels after transformation into the new coordinates. 

Classification of Off bipolar cells. BC stratification profiles were computed by 
dividing surface voxels into 100 bins spanning 0 to 100% IPL depth. Classification 
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into cell types was done by using methods similar to those described previously’. 
The BCs were split into shallow (BC1/2) and deep (BC3/4) clusters using the 75th 
percentile depth of the stratification profile. The BC1/2 cluster was further sub- 
divided into two clusters by stratification width, defined as the difference between 75th 
and 25th percentile depths. On the basis of cells per square millimetre (Extended Data 
Fig. 6f), we inferred that the wider cluster was BC2 and the narrower cluster was 
BC1. These two types were originally defined by molecular criteria’, and our inferred 
correspondence with structural definitions is transposed relative to a previous 
report’*. The BC3/4 cluster was subdivided into BC4 and BC3 by the 10th percentile 
depth, because the molecularly defined BC4 stratifies closer to the INL’. Finally, BC3 
was subdivided into BC3a and BC3b on the basis of axonal arborization volume, 
with BC3a having the larger axonal volume. Each of the above subdivision steps was 
based on a feature with a roughly bimodal histogram (Extended Data Fig. 5). 

The result still contained a small number of classification errors, detected when 
adjacent BCs of the same type overlapped enough to violate the mosaic property. 
Corrections were made by an automatic algorithm that greedily swapped cells 
from one cluster to another such that the total overlap between convex hulls of 
cells of a given type was minimized. Two swaps were vetoed by an expert (J.S.K.) 
on the basis of morphological features. In all, six cells were swapped within BC1/2 
and 13 within BC3/4. In the final classification, 41, 56, 29, 35 and 34 BCs were 
identified as types 1, 2, 3a, 3b and 4, respectively (Extended Data Fig. 6). Cells that 
violated the mosaic of all types (7) or had irregular stratification profiles (9) were 
discarded as possible reconstruction errors or amacrine cells. 

Contact analysis. Edges of the affinity graph connecting BC with SAC voxels were 
defined as BC-SAC contact edges. For each pair, the sum of the edges yielded an 
estimate of contact area. The Euclidean distance separating each BC-SAC pair was 
computed after projecting their centres onto the SAC plane. Centres of SAC somata 
were manually annotated, and centres of BC arborizations were computed as the 
centroids of their surface voxels. The pairs were binned by distance of the BC from 
the SAC soma. For every pair in a bin, the fraction of SAC surface area devoted to 
BC-SAC contact within the convex hull of the BC was computed as the ratio of BC- 
SAC contact edges to SAC surface edges within the convex hull. The latter was 
estimated by the number of SAC surface voxels multiplied by a geometric conver- 
sion factor of 1.4 SAC surface edges per surface voxel. (This factor was estimated by 
dividing the total number of SAC surface edges by the total number of SAC surface 
voxels in the volume.) BC-SAC pairs with fewer than 10,000 SAC surface voxels 
inside the hull were excluded from the computation to reduce the effect of fluctua- 
tions. The ratios for BCs of the same type were averaged for each distance bin and 
multiplied by a mosaic overlap factor to yield the values in Fig. 4d. The mosaic overlap 
factor represents the extent to which neighbouring convex hulls overlap one another, 
which varies by cell type. This factor was computed by dividing the sum of the hull 
areas for each cell by the area of the union of hulls for each cell type. For absolute 


rather than fractional areas, edges in the affinity graph were converted to area in um’, 
using the conversion factor of 291.5 ,1m* per edge. This factor averages over the 
different edge orientations and compensates for voxelization effects. A result very 
similar to Fig. 4d can also be obtained by an alternative method that is simpler but 
does not yield error bars (Extended Data Fig. 7c). 
Co-stratification analysis. All SAC surface voxels were binned by distance from 
the soma centre in the SAC plane. Within each bin, the stratification profile was 
computed as for the BCs. The quartiles (median and 25th and 75th percentiles) 
are graphed in Fig. 5a. The prediction of contact from co-stratification is based on 
the following formalism. 

We define the arborization density p,(r) as the surface area per unit volume at 


dxdydzp,(r) 


is the total surface area of the arborization. We assume that the contact density 


location r of a type a cell with soma centred at the origin. Its integral 


received by one cell of type a from all cells of type b is equal to 
Cab(t) = Palt) > y(t —Tni) (3) 


The sum over the b mosaic can be approximated by a function that is independent 
of x and y, 


SS polt—10i)~ovso(Z) (4) 
where o, is the number of type b neurons per retinal area and 


so(Z) = | dxayou(ey. (5) 


is the stratification profile of a cell of type b. The SAC arborization density is 
assumed radially symmetric, 


Psaclt) = Psach\/ x? +y".2), 


where Psac(r) can be regarded (up to normalization) as the SAC stratification 
profile as a function of distance r= \/x? +-y? from the SAC soma. Integrating the 
contact density (3) and normalizing yields the fraction ,(r) of SAC area contacted 
by cell type b as a function of r, 


J dzpsacl,2)so(@) (6) 


$= oy J dzpsaclr.2) 


45. Jeon, C.-J., Strettoi, E. & Masland, R. H. The major cell populations of the mouse 
retina. J. Neurosci. 18, 8936-8946 (1998). 
46. Snyder, J. P. Map Projections-A working Manual 1395 (USGPO, 1987). 
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Extended Data Figure 1 | EyeWire screenshots. a, Numerical score after neuron (top left), leaderboard (right), settings and help (bottom right). 
gameplay of a cube, with leaderboard below. b, Overview mode with neuron c, Tutorial play. 
under reconstruction (centre), global chat (bottom left), progress bar for 
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Username® (free text) 
Gender* 
Male/Female 
Age” (free text) 
Location* 
City, State/Province 
Country 
Are you... 
White or Caucasian 
Asian 
African American or Black 
American Indian or Alaska Native 
Hispanic 
Pacific Islander 
Education* 
Middle School 
High School - current student 
High School 
Some College - current student 
Some College - not currently a student 
Finished College (Undergrad) 
Some Graduate School - current Masters student 
Masters -- Finished Degree 
Some Graduate School - current PhD student 
PhD -- Finished Degree 
MD/DO 
Occupation* (free text) 
Do you have prior experience in neuroscience?** 
Yes/No 
If yes, please explain.** 
How long do you play EyeWire each week?* 
Less than 1 hour/More than 1 hour 
If you play for more than 1 hour per week, how long do you play? 
1 to 2 hours 
3 to 5 hours 
6 to 10 hours 
11 to 20 hours 
21 to 30 hours 
31 to 40 hours 
41 to 50 hours 
More than 50 hours 
What scientific purpose does EyeWire serve? (free text) 
Why do you play EyeWire? (free text) 
How did you discover EyeWire? (free text) 
If you could add one feature to EyeWire, what would it be? (free text) 
Anything else you would like to add? (free text) 


Survey launch date: April 14, 2013. *required question, ‘question added on 7/7/2013 


Extended Data Figure 2 | Questionnaire administered to EyeWirers. 
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Extended Data Figure 3 | EyeWire demographics. a, b, Data based on 729 _ of cubes submitted. c, Gender distribution of all respondents and those among 
responses to the questionnaire in Extended Data Fig. 2. Age distribution of the top 100 players. d, Distribution of educational levels. 
(a) all respondents and (b) those among the top 100 players ranked by number 
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Extended Data Figure 4 | Entirety of reconstructed SACs. Only the central 
region of this plexus of SAC dendrites is portrayed in Fig. 3b. Scale bar, 50 um. 
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Extended Data Figure 5 | Clustering procedure for BCs. a, Cells were 


divided by the 75th percentile of their stratification profiles. b, The shallow 


cluster BC1/2 was separated into BC1 and BC2 using stratification width, 


defined as the difference between 75th and 25th percentiles. c, The deep cluster 


10th percentile (% IPL depth) 


BC3/4 was divided by 10th percentile into BC4 and BC3. d, BC3 was divided by 
axonal volume to yield BC3a and BC3b. Scatter plots of the BC1/2 (e) and 
BC3/4 (f) divisions show swaps made to eliminate mosaic violations. 

No swaps between BC1/2 and BC3/4 were needed. 
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Type |n |Hull Area {Density |Overlap 


BC1 |41/382+104 |3652 40% 


BC4 |34/266+114 |4303 15% 


Extended Data Figure 6 | Mosaics of Off BC types. a-e, Reconstructed BCs of _ the cell) are in j1m’. Type densities are the number of cells (n) divided by the 
types 1, 2, 3a, 3b and 4 (a through e, respectively). BC1/2 mosaics appear area of the union of hulls of that cell type, and are in cells per mm? without 
complete. BC3/4 mosaics show some gaps, probably because some thin axons compensation for tissue shrinkage (Methods). Our densities resemble those of 
were missed in the INL (Methods). Scale bar, 50 pum. f, Statistics of BC types. Wiassle et al.°, who found 2,233, 3,212, 1,866, 3,254 and 3,005 cells per mm’, 
Means and standard deviation of the hull area (area of the convex hull around 
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Extended Data Figure 7 
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Alternative contact analysis. Analysis based on 


summing over BC-SAC pairs rather than averaging as in the main text. a, Total 
BC-SAC contact versus distance from the SAC soma. b, Total SAC area within 
the union of convex hulls of each BC type versus distance. The peak at 80 jum is 
the location of maximum dendritic branching. The sharp decrease at larger 
distances is due to thinning and termination of branches. The graphs differ 
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across BC types, which in our sample do not cover exactly the same retinal 
areas. c, Fraction of SAC area in contact with BC types, estimated by dividing 
contact area (a) by SAC area (b). This estimate is similar to that of Fig. 4d, but 
lacks error bars. d, Fraction of SAC area contacted by all BC types, the sum 


of the contact fractions in c. Also plotted is the contact predicted by co- 
stratification, the sum of the curves from Fig. 5b. 


©2014 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Figure 8 | Proximity versus contact. Neurons that c, d, Other SACs are well within the arborizations of the same two BCs, yet 
intermingle may or may not contact each other. a, b, Type 2 (a) and 3a BCs make no contact at all. 
(b) contacting SACs. The cells are roughly 24 and 21 jum wide, respectively. 
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Extended Data Figure 9 | Model direction selectivity index (DSI) versus 
stimulus speed. The graphs are for travelling sine waves of various wavelengths 
i (units of Ax). Speed is in units of Ax/t. The preferred speed (horizontal 
location of each peak) is /(27). Note that responses are cut off at high speeds by 
the temporal filters of the model, but the DSI can decay more slowly. 
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c-kit* cells minimally contribute 
cardiomyocytes to the heart 


Jop H. van Berlo?*, Onur Kanisicak", Marjorie Maillet!, Ronald J. Vagnozzi', Jason Karch’, Suh-Chin J. Lin’, Ryan C. Middleton’, 


Eduardo Marban? & Jeffery D. Molkentin'* 


If and how the heart regenerates after an injury event is highly debated. c-kit-expressing cardiac progenitor cells have 
been reported as the primary source for generation of new myocardium after injury. Here we generated two genetic 
approaches in mice to examine whether endogenous c-kit* cells contribute differentiated cardiomyocytes to the heart 
during development, with ageing or after injury in adulthood. A complementary DNA encoding either Cre recombinase 
or atamoxifen-inducible MerCreMer chimaeric protein was targeted to the Kit locus in mice and then bred with reporter 
lines to permanently mark cell lineage. Endogenous c-kit* cells did produce new cardiomyocytes within the heart, 
although at a percentage of approximately 0.03 or less, and if a preponderance towards cellular fusion is considered, the 
percentage falls to below approximately 0.008. By contrast, c-kit* cells amply generated cardiac endothelial cells. Thus, 
endogenous c-kit* cells can generate cardiomyocytes within the heart, although probably at a functionally insignificant level. 


The adult mammalian heart was originally proposed to be essentially 
incapable of renewal after injury or with ageing; although some recent 
studies have shown that the heart is capable of new cardiomyocyte for- 
mation with varying degrees of regenerative potential’. The concept that 
stem cells are the source for cardiomyocyte regeneration arose from initial 
observations in which bone-marrow-derived c-kit* haematopoietic stem 
cells (HSCs) showed restoration of the myocardium after infarction injury 
when given exogenously”. However, subsequent studies demonstrated 
that HSCs possessed essentially no ability to make cardiomyocytes, call- 
ing into question these earlier reports**, at which time the field shifted 
toa focus on endogenous c-kit* cardiac progenitor cells (CPCs) resid- 
ing within the myocardium’. Such cells isolated from the rat heart were 
reported to differentiate into cardiomyocytes, smooth muscle cells and 
endothelial cells, even after clonal derivation, and when injected into the 
infarct region they produced substantial new myocardium®. Mouse and 
human c-kit* CPCs were also isolated and marked, and after injection 
into an infarcted mouse heart, were shown to generate substantial levels 
of labelled cardiomyocytes, capillaries and fibroblasts’. More recently, 
resident c-kit” CPCs were reported to be both necessary and sufficient 
for complete repair and functional restoration of the myocardium after 
isoproterenol-induced cardiomyocyte killing, whereas bone-marrow- 
derived c-kit* cells had no regenerative effect*. However, other studies 
with adult cardiac resident c-kit* cells have reported the opposite: that 
these cells do not possess the ability to generate cardiomyocytes in vivo**””. 
To address ongoing controversy, we generated mice in which the Kit 
locus was used for lineage tracing analysis to examine if and how fre- 
quently c-kit" cells generate cardiomyocytes in vivo. 


c-kit* contribution to the growing heart 

The Kit locus was targeted with a cDNA encoding Cre recombinase fused 
to an internal ribosome entry sequence (IRES) to concurrently express 
enhanced green fluorescent protein (eGFP)-tagged with a nuclear local- 
ization signal (nls) (Fig. 1a). These Kit*/“° mice were bred to LoxP site- 
dependent Rosa26-CAG-loxP-STOP-loxP-eGFP (R-GFP) reporter mice 


to irreversibly mark any cell that previously or currently expresses this 
Kit locus (Fig. 1a). Four to twelve weeks after birth, the fidelity of the 
genetic system was assessed in comparison with known domains of 
c-kit protein expression, such as melanocytes of the skin, Leydig cells 
in the testis, interstitial cells of the intestine, lung and wide areas of the 
spleen, all of which showed eGFP cellular labelling (Fig. 1b and Extended 
Data Fig. 1a)'!". In bone marrow, 83% of the c-kit-antibody-detected 
cells were eGEP* by standard fluorescence-activated cell sorting (FACS) 
analysis (Fig. 1c), while imaging cytometry analysis detected coincident 
eGFP* expression and c-kit immunoreactivity in 88% of the bone mar- 
row cells and 76% of the non-myocyte fraction from the heart (Fig. 1d, e). 
To further verify the specificity of the Kit-Cre allele we examined real- 
time eGFPnls expression in the heart, ileum and skeletal muscle for 
co-expression of c-kit protein (antibody), which was always coincident 
(Fig. 1f, g and Extended Data Fig. 1b, c). In bone marrow, 94% of the 
eGFP* cells were Lin”, indicating a high degree of fidelity with the Kit- 
Cre allele (Extended Data Fig. 1d). In the heart c-kit-antibody-positive 
mononuclear cells were predominantly eGFP* at 4 weeks of age using 
the Kitt/“* x R-GEP reporter strategy, whereas in testis recombina- 
tion was only observed in Leydig cells, of which >80% were eGFP* 
(Extended Data Fig. le, f). Thus, the specificity of the Kit-Cre allele appears 
identical with known regions of c-kit protein expression in vivo. 

In an exhaustive search by histological methods across four hearts 
from Kit*'“* mice for current eGFPnls expression at 4 weeks of age, 
no eGFP* cardiomyocytes or endothelial cells were identified (only 
mononuclear CPC-like cells were observed), strongly suggesting that 
the Kit locus is not spontaneously activated in differentiated cell types 
of the heart (Fig. 1f). However, in conjunction with the R-GFP reporter 
allele for ongoing c-kit lineage tracing, the myocardium showed many 
eGFP * differentiated cell types, although cardiomyocytes were very rare 
(Fig. 1h, i). Even more rarely, areas suggestive of cardiomyocyte clonal 
expansion were identified (Fig. 1i). No eGEP* cells were observed in 
hearts of single R-GFP mice (data not shown). To more rigorously 
quantify the extent of cardiomyocyte recombination-based labelling, 
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Figure 1 | Kit-Cre lineage tracing. a, The Kit locus was targeted in mice to 
express Cre recombinase and eGFP with a nuclear localization sequence 
(eGFPnls) behind an internal ribosome entry site (IRES). These mice were 
crossed with Rosa26 reporter mice (R-GFP) for lineage tracing. b, Diagram of 
mice used for all experimentation in this figure. c, Representative FACS plot of 
bone marrow from Kit*/“* x R-GFP mice gated for c-kit antibody, then eGFP 
fluorescence to reflect recombination of the R-GFP locus (representative of 
n= 6 mice). d, Direct imaging cytometry analysis of eGFP expression in bone 
marrow (averages from n = 3 mice, *P < 0.05 versus R-GFP). e, Same 
quantitative imaging cytometry analysis as in d except the non-myocytes were 
isolated from hearts of Kit*/“* X R-GEP mice (averages from n = 3 hearts, 
*P < 0.05 versus R-GFP). f, Representative cardiac immunohistochemistry to 
show current expression from the Kit-Cre allele (green, eGFPnls) versus 
endogenous c-kit protein detected by antibody (Ab, red). The inset box shows 


hearts were disassociated and eGEP * cells were directly counted (Fig. 1)), 
revealing a level of 0.027% myocytes from the c-kit lineage (Fig. 1k). This 
low percentage was confirmed by PCR analysis for DNA recombination 
at the Rosa26 locus from purified cardiomyocytes versus spleen (Fig. 11). 
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two mononuclear c-kit expressing cells. g, Quantification of average number of 
c-kit* cells per longitudinal heart section (n = 4 hearts). h, Representative 
histological section at two magnifications (white box) of a Kit™/“* x R-GEP 
mouse heart with desmin antibody in red, eGFP antibody in green, and nuclei 
in blue. The arrow shows an eGFP™ cardiomyocyte. i, Representative 
immunohistological image showing a rare area of cardiomyocyte clonal 
expansion (arrow) (n = 6 hearts analysed). j, Image of cells disassociated from 
the hearts of Kitt’“* x R-GEP mice (n = 3 hearts analysed). White arrow 
shows a rare eGFP fluorescing cardiomyocyte, black arrowheads show eGFP 
fluorescent non-myocytes. k, Quantification of eGFP* fluorescent 
cardiomyocytes (81 from 303,264 total cardiomyocytes, 3 hearts, *P < 0.05 
versus R-GFP). 1, DNA electrophoresis after PCR showing Cre-mediated 
Rosa26 locus recombination in semi-purified cardiomyocytes and spleens 

(n =2 Kit*/“* x R-GEP mice). All error bars represent s.e.m. 


c-kit* non-myocyte lineage analysis 

Hearts of Kit*/° X R-GFP mice at 4-12 weeks of age were further 
examined to identify the remaining eGFP* non-myocytes. Examples 
of eGFP labelling co-incident with fibroblasts (vimentin co-labelling), 
endothelial cells (CD31, CD34, von Willebrand factor (vWF)), immune 
cells (CD3, CD45) and, rarely, smooth muscle «-actin (a-SMA)-expressing 
cells, were identified, although the most prevalent co-localizations were 
with CD31-, CD45- or CD34-positive cells (Fig. 2a-g). Indeed, using 
a cocktail of antibodies for CD31, CD45, CD34 and CD3, versus sar- 
comeric -actin, we were able to account for almost all eGFP* non- 
myocytes in the hearts of adult Kit*/° x R-GFP mice, either when 
analysed from histological sections or as dissociated individual cells (Ex- 
tended Data Fig. 2a—c). FACS analysis showed that 18% and 77% of the 
total eGFP* non-myocytes in the heart were CD45 or CD31 positive, 
respectively (Fig. 2h, i). Confocal microscopy analysis showed exact co- 
localization between eGFP* cells in the heart and CD31 protein expres- 
sion, but not with NG2 staining for pericytes (Fig. 2)). 

Wealso collected Kit*/“"* x R-GFP mice at birth (postnatal day (P)0) 
to analyse the contributions of c-kit cells to the heart during embryonic 
and fetal development (Extended Data Fig. 3a). Control histological 
sections from the ileum and lung showed the expected distribution of 
c-kit * cells (Extended Data Fig. 3b), and the heart also showed numerous 
eGFP* cells throughout (Extended Data Fig. 3c). Immunohistochemical 


Figure 2 | Analysis of cardiac cells from Kit*’“* x R-GFP mice. 

a-g, Representative immunofluorescent images of heart histological sections 
from Kit*’“"* x R-GFP mice at 4 weeks of age stained with eGFP antibody 
(green), nuclei in blue and either CD31, vWF, CD34, CD3, CD45, vimentin 
or a-SMA in red. Arrows show cells with overlap in staining (n = 3 hearts 
examined). h, i, FACS plot showing lineage markers of heart-isolated c-kit- 
derived eGFP* cells for CD45 (h) and CD31 (i) (representative of n = 6 
for CD45 at 4 weeks of age, and n = 3 for CD31 at 12 weeks of age). 

j, Representative immunofluorescent image from heart histological section 
of a Kit*/“* x R-GEP mouse at 4 weeks for eGFP fluorescence (green), 
CD31 antibody staining (blue) and NG2 antibody staining (red). Right 
panel shows composite with transmitted light (n = 2 hearts analysed). 
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analysis of the PO heart with a sarcomeric cardiomyocyte marker showed 
that nearly all of the eGFP* cells were non-myocytes, although definable 
cardiomyocytes were clearly present at very low levels, including rare 
areas of cardiomyocyte clonal expansion (Extended Data Fig. 3d-g). 


c-kit* lineage tracing in adult heart 

To specifically address the question of new cardiomyocyte formation 
within the adult heart, we generated a mouse model in which the 
tamoxifen-inducible MerCreMer protein was targeted to the Kit locus 
(Kit*/“@™), followed by cross breeding with the R-GFP reporter line 
(Fig. 3a). To verify the fidelity of this system, Kit*/“©“ x R-GFP mice 
were given tamoxifen during postnatal maturation for approximately 
4 weeks followed by collection of tissues with known sites of c-kit expres- 
sion (Extended Data Fig. 4a). Kit*/™™ x R-GEP mice showed ~70% 
overlap in recombination-dependent eGFP expression and endogen- 
ous c-kit protein in Leydig cells of the testis (Extended Data Fig. 4b). 
Importantly, no eGFP" cells were observed in the absence of tamox- 
ifen at any age examined or after myocardial infarction injury, dem- 
onstrating that the MerCreMer system does not ‘leak’ (Extended Data 
Fig. 4c). Kit"/““ x R-GFP mice were also given tamoxifen from day 1 


a ATG e * 
Kit & 60 Control 
MerCreM I—_/ Hk s 
E1 — ; 2 4g | Mi Tamoxifen 
x + Tamoxifen 2 
+ 20 
a 
m7 
combination o R-GFP _ Kit*/MCM 
R-GFP 


b Ga 4 f Kit*/MCM x R-GFP 
mo 
Ld 
eee 
+Tamoxifen 
c No tamoxifen 


Kit/™MCM x R-GFP 


c-kit signal 


-108 10° 104 105 g Kit*/MCM x R-GFP 
(eGFP) 
; 10 um 
d +Tamoxifen 


Kit/™MCM x R-GFP 


@ 
=f 
a 
n 
x 
b h__ Birth ew 12w 
ee Pe | 
-108 = 108 +104 105 +Tamoxifen 
(eGFP) 


Figure 3 | Inducible Cre expression from the Kit locus shows limited adult 
cardiomyocyte formation. a, Genetic cross between Kitt/“™ and R-GFP 
reporter mice to lineage trace c-kit-expressing cells when tamoxifen is present. 
b, Schematic showing tamoxifen treatment between day 1 and 6 months of 
age (panels c-g). c, d, Representative FACS plots with c-kit antibody versus 
eGFP from bone marrow of Kit*/“™ x R-GEP mice without (c) or with 

(d) tamoxifen. e, FACS quantification of eGFP* cells from bone marrow 

of these mice (average from n = 2 mice for R-GFP and n = 4 for 

Kitt/M@O™ x R-GFP). *P < 0.05 versus R-GFP. f, g, Representative heart 
sections from Kit*/“™ x R-GEP mice showing c-kit™ lineage cells in green 
and cardiomyocytes in red (desmin antibody). White arrow indicates eGFP* 
adult cardiomyocyte (n = 3 mice analysed). h-j, Tamoxifen treatment of 
Kit*/“©M x R-GFP mice between 6-12 weeks of age followed by disassociation 
of cells from the hearts of these mice in h (white arrow in i shows rare 
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through 6 months of age for continuous labelling (Fig. 3b), which pro- 
duced eGFP expression in greater than 60% of bone marrow cells, but 
again no signal in the absence of tamoxifen (Fig. 3c-e). Histological 
analysis of the heart after 6 months of labelling showed rare examples 
of eGFP* adult cardiomyocytes and a relatively large number of non- 
myocytes (Fig. 3f, g). Careful analysis of the non-myocyte fraction in 
these hearts showed fibroblasts (rarely), smooth muscle cells (rarely), 
endothelial cells and immune cells, with the majority again being CD31 ~ 
(Extended Data Fig. 5a-g). Myocardial infarction injury also doubled 
the number of CD31 cells that were eGFP* in the adult heart with 
8 weeks of prior tamoxifen labelling (Extended Data Fig. 5h). We also 
conducted c-kit lineage labelling from 6-12 weeks of age, just after the 
postnatal developmental period (Fig. 3h). Upon disassociation of these 
hearts we observed 0.0055% eGFP" adult cardiomyocytes (Fig. 3i, j), 
confirmed as extremely low by PCR and quantitative PCR (qPCR) for 
Rosa26 locus recombination (Extended Data Fig. 6a-c). 

Cardiac injury increases cellular turnover in the heart, hence we sub- 
jected Kit*/““™ x R-GFP mice to myocardial infarction at 10 weeks of 
age during a 6-week tamoxifen-labelling protocol (Fig. 3k and Extended 
Data Fig. 6d-f). The percentage of eGFP” cardiomyocytes increased to 
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cardiomyocyte) that is quantified in j (127,284 cardiomyocytes across two 
hearts, 7 were eGEP*, *P < 0.05 versus R-GFP). k-n, Tamoxifen treatment of 
Kit/“™ x R-GEP mice between 8 and 14 weeks of age with myocardial 
infarction (MI) on week 10 (n = 3 mice analysed). 1, Immunohistological heart 
section for desmin (red) and eGFP (green) with nuclei in blue (arrow shows a 
cardiomyocyte from the c-kit* lineage). m, n, Disassociated cardiomyocytes 
show rare but definitive myocyte labelling (white arrow), which was quantified 
in n (225,760 cardiomyocytes from 2 myocardial infarction-injured hearts, 37 
were eGFP*, *P< (0.05 versus R-GEP). 0, p, Tamoxifen treatment between 8 
and 12 weeks of age with myocardial infarction injury occurring 3 days after 
tamoxifen cessation. p, Average number of eGFP™ cardiomyocytes from 
histological sections taken across the entire heart (n = 2 hearts, >50 sections 
each) *P < 0.05 versus R-GEP. All error bars represent s.e.m. 
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0.016% within the heart, with more being localized to the infarct border 
zone (Fig. 3l-n). c-kit* lineage cells within the heart were also pre- 
labelled by giving tamoxifen only before myocardial infarction injury, 
which again showed a very low percentage of eGFP* cardiomyocytes 
(Fig. 30, p). Percentages of eGFP* cardiomyocytes in the heart during 
4 weeks of isoproterenol infusion-induced injury were 0.007% (Extended 
Data Fig. 7a-c). These astonishingly low values of cardiomyocyte forma- 
tion were independently verified using blinded heart histological sections 
from Kit*/“©™ x R-GFP mice sent to an outside academic laboratory 
(Extended Data Fig. 8a-c). 

Finally, we also cultured total non-myocytes from the hearts of young 
adult Kit*/“"* x R-GFP mice in the presence of dexamethasone as a 
means of pushing c-kit” cells with progenitor-like activity towards 
the cardiomyocyte lineage (Extended Data Fig. 9). The data show that 
eGFP", Kit-Cre allele expressing cells are fully capable of inducing 
expression of the cardiac markers GAT A4, a-actinin and troponin T, 
suggestive of partial differentiation towards the cardiomyocyte lineage 
(sarcomeres were not observed). 


c-kit* cells fuse in the heart 


Hearts from Kit */““™ x R-GFP mice showed the presence of cells from 
blood lineages (CD3, CD45, CD34), which are known to have fusigenic 
activity with resident parenchymal cells*’*”*. To examine fusion we used 
a genetic strategy that constitutively expresses a membrane-targeted 
fluorescent tdTomato protein from the Rosa26 locus. Upon Cre-mediated 
recombination, tdTomato fluorescence is lost and a membrane-targeted 
eGFP becomes expressed (abbreviated mT/mG for membrane-targeted 
tdTomato and eGFP, respectively) (Fig. 4a). If cells fuse, both signals 
would be present but a de novo cardiomyocyte from a c-kit* lineage 


a AqgG b 


cell would be only green. Experimentally, Kit’/““™ x mT/mG mice 
were given tamoxifen for 2 weeks (8-10 weeks of age) then 3 days later 
myocardial infarctions, followed by collection at 1, 2 and 4 weeks there- 
after (Fig. 4b). Control mice were collected before myocardial infarc- 
tion but after tamoxifen (time 0). Percentages of total cardiomyocyte 
membrane-eGFP labelling, whether from fusion or not, were approxi- 
mately 0.01% at all three time points after myocardial infarction (Fig. 4c). 
Although some de novo cardiomyocytes were identified in the heart (eGFP 
only), the majority (80-88%) retained the membrane-tdTomato label 
indicating that these cells probably arose by fusion (Fig. 4d-f). Thus, 
c-kit” lineage cells can generate cardiomyocytes in the heart, although 
at ~5-fold lower values than initially predicted. 


Kit-Cre locus is not ectopically induced 


One concern with the Kit allele-based lineage tracing approach is that 
if this locus ever becomes activated ectopically in a cardiomyocyte, it 
would be wrongly ascribed as having come froma c-kit* cell. It was pre- 
viously shown that knockdown of the Kit gene in naturally occurring w 
and wv mutant mice results in defective progenitor cell activity in many 
tissues!?2, Indeed, hearts from Kit” mice showed a marked reduc- 
tion in resident mononuclear c-kit™ cells and progenitor activity”*. Hence, 
Kit null mice should lack the ability to generate eGFP * cardiomyocytes 
in the heart if they indeed arise from c-kit * cells with progenitor-like 
activity, instead of having arisen from ectopic Kit allele induction in a 
rare population of differentiated cardiomyocytes. 

Kit null mice were generated by placing the Kit-Cre allele over the 
Kit-MerCreMer allele. Although these mice die at birth, viable nulls at 
embryonic days 16.5 and 18.5 were identified and examined (Fig. 4g-i). 
Fourteen total eGFP* cardiomyocytes were counted from four Kit "/° 
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Figure 4 | Assessment of fusion versus de novo cardiomyocyte formation in 
the heart. a, Genetic strategy in which Kitt/@™ mice were crossed with 
Rosa26 targeted mice containing the membrane targeted td Tomato/eGFP 
(mT/mG) reporter. b-f, Tamoxifen was given to Kit*/“@™ x mT/mG mice 
between 8 and 10 weeks, followed 3 days later by myocardial infarction injury. 
c, Quantification across >50 histological sections of all eGFP * -expressing 
cardiomyocytes (averages) before myocardial infarction (n = 4 hearts) and 1 
(n = 4 hearts), 2 (n = 5 hearts) and 4 (n = 3 hearts) weeks after myocardial 
infarction injury. Error bars represent s.e.m., *P < 0.05 versus mT/mG. 

d, Example of a c-kit-lineage-derived de novo cardiomyocyte in which 
membrane-eGFP (green, left) is expressed and tdTomato fluorescence (red, 
right) is lost. e, Example of eGEP* cardiomyocyte (green) that still contains 
endogenous membrane-tdTomato fluorescence (red), indicating fusion. Nuclei 
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(h)), KitMO’/Ce x R-GEP (null, n = 2 (i)) or KitMO!’* (null, no reporter, 

n=2 (g)). Red staining is «-actinin and green is eGFP. j, Higher magnification 

image from h, showing a definitive eGFP* cardiomyocyte (arrow). k, Higher 

magnification image from i, which shows only eGFP” non-myocytes in Kit null 

hearts. 1, m, Histological heart images from E18.5 Kit*/* (het, n = 1) and 

Kit“OM’“ (null, 1 = 1) embryos containing the mT/mG reporter, again only 

the heterozygotes show examples of eGFP* cardiomyoc es (arrow). 

n, Western blot showing loss of c-kit protein in Kit“©’“"* embryos (nulls) 

versus heterozygous controls. 
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X R-GFP and 1 Kit*/“"* x mT/mG embryos across 56 histological sec- 
tions spanning the heart ( (Fig. 4j, |). However, hearts from two Kit“0"/“" 
X R-GEP and one Kit“! x mT/mG embryos (nulls) showed lower 
total eGFP™ cells in the heart and no cardiom yocytes across 69 histo- 
logical sections (Fig. 4i, k, m). Importantly, Kit““”’“* embryos showed 
no c-kit protein expression confirming their null status (Fig. 4n). Taken 
together, these data indicate that eGFP* cardiomyocytes that are lineage 
traced with the Kit-Cre allele are not due to inappropriate activation of 
the Kit gene for even a brief period of time in rare existing cardiomyo- 
cytes, but rather they either arose by transdifferentiation from c-kit* 
lineage precursor cells or by fusion. 


Discussion 


The original hypothesis that c-kit cells have the ability to contribute 
to the cardiomyocyte compartment of the heart, as well as other cell types, 
is correct as determined by the lineage tracing technique used here®. Indeed, 
the observation that embryonic and postnatal labelling in the hearts 
of Kit/“° x R-GFP mice shows definable regions with cardiomyo- 
cyte clonal expansion strongly suggests that these c-kit* cells can make 
cardiomyocytes in vivo. More importantly, loss of the Kit gene, which 
is known to compromise the progenitor and migration activity of c-kit” 
cells, completely prevented cardiomyocyte formation from c-kit* cells. 
However, throughout development, with ageing or with cardiac injury, 
the percentage of cardiomyocytes emerging from the c-kit” lineage was 
astonishingly low and hence highly unlikely to ever considerably affect 
cardiac function. The mT/mG detection system also supported the exis- 
tence of de novo cardiomyocyte formation in the adult heart from the 
c-kit* lineage but at ~5-fold lower levels than initially quantified owing 
to prevalent cellular fusion events. 

Exogenous c-kit* cells are currently being used to treat post- 
myocardial infarction heart failure patients, and early results have 
shown small, albeit significant, functional improvements in the heart™*. 
However, our results suggest that the potential benefit of injecting c-kit 
cells into the hearts of patients is unlikely attributable to new cardio- 
myocyte formation, hence caution is warranted until the mechanisms 
in play are better defined, or until we are able to considerably enhance 
the cardiogenic potential of these cells (see Supplementary Discussion). 


METHODS SUMMARY 


The Kit allele was targeted in SV129 embryonic stem cells to express either Cre recom- 
binase alone or a tamoxifen-inducible Cre recombinase referred to as MerCreMer. Hemi- 
zygous targeted mice were crossed with FVB.Cg-Gt(ROSA)26Sor""(CAG eZ EGEPIGI jy 
(previously modified by cross-breeding to B6(C3)-Tg(Pgk1-FLPo)10Sykr/J) or 
B6.129(Cg)-Gt(ROSA)26Sor"™4ACTR a Tomato EGEP) 40/7 Tissues from these mice 
were subjected to histological analysis and immunohistochemistry at multiple ages 
and after select treatments. Antibodies used are shown in Supplementary Table 1 
(see Methods for more detailed descriptions). 
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METHODS 

Mice. All experiments involving mice were approved by the Institutional Animal 
Care and Use Committee (IACUC) at Cincinnati Children’s Hospital. No human 
subjects or human material was used. Targeted Kit-Cre-IRES-eGFPnls and Kit- 
MerCreMer mice were generated by standard gene targeting techniques. Homology 
arms upstream and downstream of the ATG start codon of the Kit gene in exon 1 
were subcloned into a plasmid backbone containing Amp’ and a diphtheria toxin 
(DTA) cassette through recombineering. A cDNA encoding either Cre-IRES-eGFPnls 
(from A. P. McMahon, UCLA) or MerCreMer, as well as an frt site-flanked neomycin 
selection cassette, were cloned in-frame with the Kit ATG start site. Embryonic stem 
(ES) cells were electroporated with linearized targeting vector. Targeted clones were 
identified by Southern blot and PCR. ES cell aggregation with 8-cell embryos was 
used to generate chimaeric mice with the Kit-Cre-IRES-eGFPnls construct”*, whereas 
the Kit-MerCreMer mice were generated by blastocyst injection at the Howard 
Hughes Medical Institute (HHMI) gene-targeting core facility (by C. Guo at HHMI, 
who also generated the Kit-MerCreMer targeting vector and targeted ES cells). 
Germline transmitting male chimaeras were crossed with Rosa26-Flpe females 
(B6.129S4-Gt(ROSA)26Sor'"! 1?) Py™/RainJ) to delete the neomycin cassette and 
verified offspring were further backcrossed to C57BI/6J for five generations. Re- 
porter mice FVB.Cg-Gt(ROSA)26Sor'”"!(AG-4Z,EGFPIGIN/Y (yreviously modified by 
cross-breeding to B6(C3)-Tg(Pgk1-FLPo)10Sykr/J) and B6.129(Cg)-Gt(ROSA) 
26Sor'4ACTB-tdTomato-EGEP)Lu0 7) Were purchased from the Jackson Laboratories. 
Kit null mice were generated by breeding male Kit*'“"* with female Kitt“ x 
R-GEP mice, of which 1:8 embryos are predicted to be KitMCM/C’ & R-GFP (nulls, 
with the reporter). Littermates that were Kit '/“° x R-GFP were controls to show 
the full extent of eGFP* cardiomyocytes that are possible in the heart. Because Kit 
null mice were not identified at birth in multiple litters, we collected mice from this 
cross at E16.5 and E18.5, which identified viable Kit null embryos. PCR genotyping 
of Kit-Cre-IRES-eGFPnls used the following primers: (wt-Kit-forward: 5'-CTGT 
AGCAGAGAGAGGAGCT-3’ and Cre-reverse: 5’-CTACACCAGAGACGGAA 
ATCC-3'); Kit-MerCreMer (MerCreMer-forward: 5’-CTGAACCGCCCATGAT 
CTATT-3’ and MerCreMer-reverse: 5’-GTGGATGTGGTCCTTCTCTTC-3’); Kit 
(forward: 5’-CTGTAGCAGAGAGAGGAGCT-3’ and reverse: 5'’-ACAGAGGG 
TGCAGTCCTCTT-3’). Mice of various ages were used, as indicated for each exper- 
iment. Both male and female mice were used in all experiments. 

Animal procedures. Tamoxifen citrate containing chow (Harlan laboratories) 
was used to activate the inducible MerCreMer protein, thereby inducing Cre recom- 
binase activity. We used the standard 400 mg kg ' chow for all experiments, except 
for labelling right after birth for which we used 200 mg kg” '. The duration of treat- 
ment is indicated within each experiment. Myocardial infarction was induced in 
mice via permanent surgical ligation of the left coronary artery”®. In brief, mice 
(both sexes) were anaesthetized using isoflurane and a left lateral thoracotomy was 
performed. The left coronary artery was identified and ligated just below the left 
atrium. After closing the thoracotomy and expelling residual air, the mice were 
allowed to recover. Two-dimensional M-mode echocardiography was performed 
on mice anaesthetized with 2% isoflurane, using a Hewlett Packard SONOS 5500 
with a 15-MHz transducer. An average of three measurements was taken for each 
mouse. Group sizes were determined from past experience and based on statistical 
power calculations, and the number of mice is given in the figure or figure legends. 
Isoproterenol treatment was given via osmotic minipumps (Alzet) at 60 mg per kg 
per day (in 1 1M ascorbic acid) for 4 weeks. Mice were either killed by CO, asphyxi- 
ation followed by cervical dislocation or by deep isoflurane sedation with cervical 
dislocation followed by excision of the heart. Isolated organs were fixed in 4% 
paraformaldehyde overnight, then processed for paraffin embedding or fixed for 
3 hand immersed in PBS containing 30% sucrose overnight before embedding in 
OCT (Tissue-Tek) for cryo-sectioning. 

Cell isolation. We isolated bone marrow cells by flushing femurs and tibiae with 
Hanks Balanced Salt Solution (HBSS). In brief, bone marrow was flushed using a 
25-gauge needle attached to a syringe containing 10 ml ice-cold HBSS supplemen- 
ted with 2% fetal calf serum (FCS). Cells were spun at 400g for 10 min at 4°C and 
pellets were re-suspended in 2% FCS/HBSS. After isolation, cells were kept on ice 
and further processed for flow cytometry or DNA extraction. Adult cardiomyocytes 
were isolated by removal of beating hearts from anaesthetized mice and cannu- 
lated for retrograde perfusion with modified Tyrode solution (NaCl 120 mM, KCl 
14.7 mM, KH,PO, 0.6 mM, Na,HPO, 0.6mM, MgSO, 1.2mM, HEPES 10mM, 
NaHCO; 4.6 mM, taurine 30 mM, glucose 5.5 mM, butanedione monoxime (BDM) 
10 mM, pH 7.40) supplemented with Liberase TH (Roche)”. After perfusion, hearts 
were disassociated into individual cardiomyocytes, calcium was gradually added 
back and cells were plated on laminin-coated cover slips in modified Tyrode solu- 
tion supplemented with 1 mg ml” ' BDM and immediately counted for eGFP™ car- 
diomyocytes. After counting, cells were imaged with a Nikon Eclipse TE300 inverted 
fluorescence microscope. Non-cardiomyocytes from the heart were isolated by ret- 
rograde perfusion as described previously”. In brief, hearts were perfused with a 


digestion buffer (NaCl 126 mM, KCl 4.4mM, MgCl, 5 mM, Na pyruvate 5 mM, 
NaH2PO, 5 mM, creatine 5 mM, HEPES 5 mM, glucose 22 mM, taurine 20 mM) 
containing 15 1M CaCl,, collagenase type 2 (Worthington, 274 U ml *) and Pro- 
tease XIV (Sigma-Aldrich, 0.57 U ml '). Cardiomyocytes were eliminated by two 
serial centrifugations at 10g for 5 min at 4 °C and the non-cardiomyocyte cell frac- 
tion was collected after a final centrifugation at 500g for 10 min at 4°C. 

Flow cytometry. Flow cytometry was performed on bone marrow and non-myocyte 
heart fractions using a BD FACSCanto II running FACSDiva software with the 
following configuration: 405-nm laser for Alexa 405, 633 nm for APC (allophy- 
cocyanin) and 488 nm for GFP. Voltages were determined using single-stain and 
fluorescence minus one (FMO) controls. Analysis was performed using FlowJo 
vX. Haematopoietic lineage committed bone marrow cells were identified and neg- 
atively gated using a panel of mouse antibodies (CD3e, CD11b, CD45R/B220, Ly6G 
and Ly-6C and TER-119; collectively Lin” ). c-kit* cells were identified by antibody 
labelling and then plotted for endogenous eGFP fluorescence. Alternatively, all 
bone marrow cells were labelled with c-kit antibody and then plotted for both c-kit 
positivity and endogenous eGFP fluorescence. Non-myocytes from the heart were 
first gated for eGFP fluorescence and plotted for CD45 or CD31 positivity using 
antibodies conjugated to APC for fluorescence intensity separation. Summary of 
antibodies used is given in Supplementary Table 1. 

Multispectral-imaging flow cytometry. Quantitative real-time c-kit and eGFP 
expression in bone marrow and non-cardiomyocyte cells from the hearts of Kit /“* 
X R-GFP mice was analysed by ImageStreamX (Amnis), a multispectral flow cyt- 
ometer combining standard microscopy with flow cytometry. We used the inte- 
grated software INSPIRE to run the ImageStreamX. For each experiment, cells 
were fixed and stained for c-kit antibody reactivity and suspended in 100 kl buffer 
(cold HBSS with 2% horse serum). Before running the samples, the ImageStreamX 
was calibrated using SpeedBeads (Amnis). Samples were acquired for unlabelled, 
single-colour fluorescence controls, then the experimental samples. At least 10,000 
experimental cells and 2,000 control cells were acquired for each sample. Images 
were analysed using IDEAS image-analysis software (Amnis). Summary of anti- 
bodies used is given in Supplementary Table 1. 

Immunohistochemistry. Please refer to Supplementary Table 1 for all antibody 
information and dilutions. For paraffin sections, isolated organs were fixed over- 
night in freshly diluted 4% paraformaldehyde, dehydrated and sectioned at 5 jtm. 
Following citrate antigen retrieval (BioGenex), the sections were blocked for 1 hat 
room temperature in a blocking solution (PBS with 0.1% cold water fish skin gelatin, 
1% bovine serum albumin, 0.1% Tween-20 and 0.05% NaN3), which was also used 
to dilute antibodies. For cryosections, isolated organs were fixed for 3 h in freshly 
diluted 4% paraformaldehyde at 4 °C, rinsed with PBS and cryoprotected in 30% 
sucrose/PBS overnight before embedding in OCT (Tissue-Tek) and 10-p1m cryo- 
sections were collected. Cryosections were blocked for 30 min at room temper- 
ature in a blocking solution (PBS with 5% goat serum, 2% bovine serum albumin, 
0.1% Triton X-100), which was also used to dilute antibodies. Primary antibodies 
were incubated overnight at 4 °C, secondary antibodies for 2h at room temper- 
ature, washes were performed in PBS. Cryosections were used to visualize native 
eGFP or tdTomato fluorescence from the different reporters or from the IRES- 
eGFP cassette built into the Kit-Cre allele. 4’,6-diamidino-2-phenylindole (DAPI) 
was used to stain nuclei (usually in blue). Images were acquired on an inverted 
Nikon A1R confocal microscope using NIS Elements AR 4.13. Some images were 
further processed in Photoshop or Image J to increase brightness/contrast of indi- 
vidual channels before generating a pseudo-coloured overlay. 

Genomic PCR and qPCR. Genomic DNA was prepared from mouse tissues or 
isolated mouse cardiomyocytes using the DNeasy Blood & Tissue Kit (Qiagen, 
69504) per manufacturer’s instructions. In brief, cells or tissues were snap-frozen 
at time of collection then lysed by incubation with proteinase K for 3 h at 56 °C, 
followed by spin column purification and elution. Samples were treated with RNase 
A to remove contaminating RNA. PCR was performed to detect recombined and 
non-recombined Rosa26 reporter alleles using primers 5’-tctgcttcactctccccate (for- 
ward, against the CAG promoter/enhancer), 5’-gatcagcagcctctgttccaca (forward, 
against the PGKNeo cassette) and 5’ - cgctgaacttgtggccgtttac (reverse, against eGFP). 
PCR conditions were 96 °C for 2 min to separate strands, followed by 34 cycles of 
amplification (96 °C for 30 s, 56 °C for 30 s, 72 °C for 30 s) and a 5-min elongation 
step at 72 °C. PCR products were visualized on an ethidium bromide-stained aga- 
rose gel using a UV molecular imager (Bio-Rad). To quantify levels of recombined 
and non-recombined Rosa26 alleles in genomic DNA, qPCR was performed using 
SYBR Green with the same primers used for PCR above (Applied Biosystems), 
and detection with a Bio-Rad CFx96 thermocycler. Simultaneous reactions using 
the primers above were performed to detect recombined versus non-recombined 
alleles. 

Western blots. Western blotting was performed essentially as described previously”. 
E16.5 embryos were homogenized in RIPA buffer containing protease inhibitor 
cocktail (Roche) with a dounce homogenizer. Forty micrograms of protein per 
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sample were resolved on 10% SDS-PAGE gels, transferred onto PVDF membranes, 
immunoblotted with antibodies for c-kit (R&D Systems AF1356) and GAPDH 
(Fitzgerald 10R-G109a), and then incubated with the appropriate alkaline phos- 
phate-linked secondary antibodies. The PVDF membranes were visualized by en- 
hanced chemifluorescence (Amersham). 

In vitro cardiomyocyte differentiation. The non-cardiomyocyte cell fraction was 
isolated from a 3-month-old Kit'/“* X R-GEP mouse. Cells were plated at a density 
of 40,000 cells per well on gelatin-coated 6-well tissue culture dishes in DMEM 
media containing 10% FCS, antibiotics and non-essential amino acids. After 2 days, 
the cells were washed and treated with 10 nM dexamethasone in DMEM contain- 
ing 10% FCS to induce differentiation®. The media was refreshed every 3 days. After 
1 week the cells were fixed with 4% paraformaldehyde and subjected to immuno- 
histochemistry for vimentin, «-actinin, troponin T and GATA4 (antibodies listed 
in Supplementary Table 1). The cells were then imaged on an inverted Nikon AIR 
confocal microscope. 

Statistics. For studies involving induction of myocardial infarction, group sizes were 
determined on the basis of previously observed postoperative mortality rates for this 
procedure. No experimental animals were excluded in any of the analyses. Blinding 
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and randomization were not performed with the exception of the experiments in 
Extended Data Fig. 8, which were done by two observers blinded to the sample 
identity. For flow cytometry experiments and direct counting of cardiomyocytes 
in histological sections or dissociated cardiomyocytes in dishes, two-group com- 
parisons were performed using Student’s two-tailed t-test, with P < 0.05 consid- 
ered statistically significant. All error bars throughout the figures are s.e.m. and all 
represented data are averages. 
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Extended Data Figure 1 | Assessing the fidelity and specificity of the Kit-Cre 
knock-in allele. a, Representative histological sections from the indicated 
tissues of Kit*/“"* x R-GEP mice at 4 weeks of age. Blue is nuclei and green is 
eGFP. The data show eGFP expression in regions of each tissue that is often 
characteristic of endogenous c-kit protein expression (n = 3 mice). 

b, Immunohistochemistry for endogenous c-kit expression (red) in the mouse 
ileum at 4 weeks of age from Kit*’“"* mice that contain the IRES-eGFPnls 
cassette (but without the X R-GFP reporter allele) so that eGFP expression can 
be monitored in real time. The inset box and arrows show the co-staining with 
c-kit antibody and eGFP (n = 3 mice). ¢, Immunohistochemistry for 
endogenous c-kit expression (red) in quadriceps muscle of Kit*’“"* mice at 

4 weeks of age versus nuclear eGFP (green) from the Kit '/“" allele (n = 2 
mice). Although lineage tracing in Kit*/“"* X R-GEP mice, which is 
cumulative, showed abundant endothelial cells throughout the skeletal muscle 
(a), instantaneous c-kit-expressing cells are rare in skeletal muscle, and when 
identified, are always mononuclear (inset box). d, FACS quantification of bone 
marrow from Kit*’“"* x R-GEP mice at 4 weeks of age sorted for eGFP 
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expression, of which 94% are positive for the ‘lineage’ cocktail of 
differentiation-specific antibodies (averages from n = 3 mice, error bars 
represent s.e.m.). Hence the Kit-Cre allele is properly expressed in bone 
marrow and traces lineages that arise from c-kit” progenitors. 

e, Immunohistochemistry in the hearts of Kitt/“* x R-GFP mice for 
endogenous c-kit expression (red) versus all the cells that underwent 
recombination throughout development and the first 4 weeks of life, shown in 
green. Although cells that are actively expressing c-kit protein are very rare in 
the heart (~5 per heart section), the arrow shows such a cell that is also eGEP* 
for recombination. All of the currently c-kit-expressing cells identified in the 
heart were eGFP™, further verifying the fidelity of the Kit-Cre allele (n =3 
hearts). f, Same experiment as in e except the testis was examined because of the 
characteristic pattern of Leydig cells that are known to be actively c-kit- 
expressing cells (n =3 mice). The data show that greater than 80% of the 
currently c-kit-antibody-reactive Leydig cells (red outline, better observed in 
the right panel) are also eGEP* (arrows show clusters of these cells). 
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Extended Data Figure 2 | Identification of non-myocytes from the hearts of 
Kitt’ x R-GFP mice. Kit*/“"* x R-GFP mice were collected at 6 weeks of 
age (constitutive lineage labelling the entire time), although myocardial 
infarction was performed at week 4 to induce greater vascular remodelling and 
potentially more c-kit lineage recruitment over the next 2 weeks. a, Hearts were 
then collected at week 6 and subjected to immunohistochemistry with a pool of 
antibodies for CD31, CD34, CD45 and CD3 in red, whereas the green channel 
was for eGFP expression from the recombined R-GFP reporter allele due to Kit- 
Cre lineage expression. The white arrowheads show endothelial cells that are 
not contiguous with the underlying network, although most of the endothelial 
cells are from the c-kit lineage when the red and green channels are compared. 
The white arrow shows a cardiomyocyte that lacks red staining, whereas the 
yellow arrows show two areas with relatively large cells that are eGFP* and 
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could be mistaken for a cardiomyocyte, although they are also positive for the 
non-myocyte marker panel of antibodies (n = 2 mice). b, c, Spread of cells 
isolated from hearts of 8-week-old Kit "/"* X R-GFP mice at baseline that were 
subjected to immunocytochemistry for the indicated markers (n =3 hearts). 
The large white arrow in panel b shows an eGFP* (green) cardiomyocyte that 
also co-stains with sarcomeric o-actin (red). The smaller arrows show eGEP* 
non-myocytes, which in panel c, were subject to staining with a cocktail of 
antibodies again for CD31, CD34, CD45 and CD3 (all in red). This analysis 
identifies nearly all of the non-myocytes in these cell spreads. The very last 
image in panel c shows a fourth channel with higher gain so that the underlying 
cardiomyocytes (CMs) autofluoresce (in white) to show the mixed nature of the 
spread cells. Blue staining depicts nuclei. 
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Extended Data Figure 3 | Analysis of c-kit lineage labelling in the heart at PO 
(birth). a, Diagram of the timing whereby newborn Kit '/“"® x R-GFP mice 
were analysed for all subsequent experiments in this figure. b, Histological 
sections for eGFP fluorescence (green) from the ileum and lung at PO showing 
the characteristic c-kit labelling pattern as observed at other time points or in 
other studies when antibodies were used. Blue shows nuclei c, Histological 
section for eGFP fluorescence (green) from the heart at PO. Blue shows nuclei 
and magnification was X40. d, Immunohistochemical tissue section from the 
PO heart of Kit*/“* X R-GEP mice stained with sarcomeric a-actin (red) to 
show all underlying cardiomyocytes (right panel) or with eGFP expression in 
green (left panel) as being c-kit-derived. The green cells noted by the arrows are 
non-myocytes that do not express sarcomeric o-actin. e, eGFP expression alone 
(left) or eGFP with co-staining for cardiomyocytes in red (sarcomeric «-actin) 
from heart sections at PO of Kit*/“° x R-GEP mice (n = 3 mice). Blue staining 
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depicts nuclei. The cardiomyocyte that is shown has clear striations in the eGFP 
staining pattern, whereas the two non-myocytes do not show striated eGFP and 
also lack sarcomeric o-actin staining. f, eGFP expression alone in green (left) 
with nuclei in blue or eGFP with sarcomeric a-actin co-staining (red) from 
heart sections at PO of Kit"/“* x R-GEP mice. All eGFP* cells shown lack 
striations and are non-myocytes although the two cells in the centre sit directly 
on top of cardiomyocytes and could be easily misinterpreted. Great care is 
needed in scoring myocytes in the PO heart because they are small and often the 
same size as eGFP* non-myocytes. g, eGFP expression (green) with nuclei in 
blue and cardiomyocytes identified in red with sarcomeric a-actin antibody 
from heart histological sections at PO of Kit*’“"* x R-GFP mice. Here the data 
show c-kit-lineage-derived cardiomyocytes that appear in a loose cluster 
(arrows), presumably from a clonal expansion event earlier in development. 
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Extended Data Figure 4 | Additional examination of the Kit-MerCreMer 
knock-in allele and its potential leakiness in the absence of tamoxifen. 

a, Histological analysis of eGFP fluorescent cells from the indicated tissues of 
Kit*/“ x R-GEP mice that were given tamoxifen from 2 to 28 days of age 
and then collected at day 28. Nuclei are shown in blue and green shows eGFP* 
cells in the expected patterns for known regions of c-kit protein expression, 
such as the distinct pattern of melanocytes in the skin and widespread 
expression in the spleen and lungs. b, Representative immunohistochemical 
analysis in the testis of Kit'’“©™ x R-GFP mice for endogenous c-kit 
expression (red) versus cells that underwent recombination when tamoxifen 
was given by intraperitoneal injection (2 mg) for five consecutive days (green) 
(n = 2 mice). The data show that most of the cells currently expressing c-kit 
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protein in testis (only Leydig cells react, red surface staining) are also eGFP* 
(intracellular), indicating that recombination only occurs in c-kit-expressing 
cells, and in the majority of them. c, Representative histological heart sections 
from Kit *’““™ x R-GEP mice that were placed on tamoxifen-laden food or 
vehicle food (n = 6 mice per treatment) beginning at 4 weeks of age and then 
subjected to myocardial infarction injury 4 weeks later, followed by collection 
4 weeks after that. In the presence of tamoxifen, histological sections through 
the myocardial infarction border zone of the heart show widespread eGFP* 
cells (green) from the c-kit lineage (left panel), whereas in the absence of 
tamoxifen no eGFP™ cells are observed (right panel), indicating the Kit- 
MerCreMer allele does not leak at baseline or after myocardial infarction 


injury. 
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Extended Data Figure 5 | Analysis of eGFP* non-myocytes in the hearts of 
Kitt“ x R-GEP mice at baseline or after myocardial infarction injury. 
a-g, Tamoxifen was given to Kit*/““M x R-GFP mice for 1 day-6 months of 
age (a, e, f) or in mice given tamoxifen and myocardial infarction injury 

(b, ¢, d, g), followed by collection of the hearts for immunohistochemistry with 
antibodies for GFP (green), or the indicated antibodies in red: CD45 (a), 
CD3 (b), «-SMA (c), vimentin (d), CD34 (e), CD31 (f), vWF (g). Nuclei are 
shown in blue. The white arrows show cells with coincident green and red 
reactivity for each of the markers, although sometimes the red marker is 
membrane-localized whereas the green (eGFP) is always cytoplasmic. The most 


Pre-Ml 


Post-Ml 


overlapping activity with GFP expression was observed for CD31 (endothelial 
cells), then CD34, followed by CD45 (haematopoietic cells). n = 2 

Kit*/“©M x R-GEP mice for 1 day-6 months of age; n = 4 Kit/“™ x R-GFP 
myocardial infarction. h, Averages from FACS plots for the CD31 cellular 
fraction (antibody-detected) in the heart that are also eGEP* from 

Kitt/@™ x R-GFP mice (pre-MI, n = 3) after 8 weeks of tamoxifen in early 
adulthood at either baseline or 4 weeks after myocardial infarction injury 
(post-MI, n = 3). The data show about a doubling in the number of CD31 cells 
that are eGFP* after myocardial infarction (*P < 0.05 versus pre-MI). 
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Extended Data Figure 6 | Quantification of Cre activity and DNA 
recombination in the hearts of Kitt’“™ x R-GFP mice. a, Timeline for 
tamoxifen administration in Kitt/“™ x R-GFP mice. b, PCR from DNA 
isolated from the bone marrow (BM), whole heart or semi-purified 
cardiomyocytes after 6 weeks of tamoxifen treatment in Kit'““™ x R-GFP 
mice (n = 2). Bone marrow shows most of the DNA as having been recombined 
by Cre, whereas whole heart is just barely discernable, and purified 
cardiomyocytes show essentially no recombination given the sensitivity 
constraints of this assay. c, PCR was also run to more sensitively detect and 
quantify the extent of recombination, which was set relative to the 
recombination in bone marrow. Semi-purified cardiomyocytes (CM) showed 
very low rates. Averaged data are shown and error bars are s.e.m. of duplicate 
technical replicates from n = 3 Kit*/@™ x R-GFP mice. d, Schematic of 
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the tamoxifen time course and timing of myocardial infarction in 

Kit*’““M x R-GFP mice. e, Echocardiography measured cardiac fractional 
shortening (FS%) was assessed in the mice after myocardial infarction, which 
shows a reduction in cardiac ventricular performance at 1, 2 and 4 weeks 
after injury. The number of mice analysed is shown in the bars. Error bars 
represent the s.e.m. Both the control and experimental groups showed an 
equivalent reduction in cardiac function post-myocardial infarction. f, Images 
of dissociated cardiomyocytes from hearts of Kit "’““M x R-GEP mice 4 weeks 
after myocardial infarction, which were fixed and stained for sarcomeric 
a-actin antibody (red) and eGFP (green) at two different magnifications. 
One eGFP* cardiomyocyte is shown with sarcomeric patterning of the 

eGFP fluorescence. 
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Extended Data Figure 7 | Analysis of eGFP* myocytes in the hearts of 
Kit*/““™ x R-GEP mice after isoproterenol infusion-induced injury. 

a, Schematic diagram showing tamoxifen treatment of Kit*/““™ x R-GFP 
mice between 7 and 14 weeks of age with isoproterenol (ISO) infusion 
occurring between weeks 10-14. b, c, Quantification and imaging of 
disassociated cardiomyocytes (separate images shown at two different 
magnifications) from the hearts of isoproterenol-injured Kitt“ x R-GFP 
mice, which showed rare but definitive cardiomyocyte labelling. *P < 0.05 
versus R-GFP, 31 eGFP™ cells of 395,302 counted from two hearts. 
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Extended Data Figure 8 | Verifying the extent of eGFP* cardiomyocytes by 
an independent laboratory from blinded histological heart samples. 
Unprocessed cryosections and paraffin sections from the hearts of 

Kit’/““ x R-GFP mice after 8 weeks of tamoxifen were blinded and sent to 
the Marban laboratory along with negative control sections from hearts that 
should not have staining. a, b, Two separate images from cryopreserved blocks 
are shown at 200 magnification in which the cryosection was processed 

for eGFP fluorescence (green) and o-actinin antibody (red) to show 
cardiomyocytes. The data show two regions where a single eGFP* myocyte is 
visible in a region with several hundred GFP-negative cardiomyocytes. The 
single eGFP* cardiomyocyte is circled and the inset box shows a higher 
magnification. Sections were also stained for nuclei (blue). In general, 
approximately 1-2 definitive eGFP* cardiomyocytes were identified per entire 
heart section in the Marban laboratory, a result that is consistent with the 
approximate numbers of kit lineage-labelled cardiomyocytes observed by us. 
c, Image taken at X 630 magnification from a paraffin-embedded and processed 
histological section in which both an eGFP antibody (green) and a-actinin 
antibody (red) was used. Nuclei are shown in blue. The arrow shows a single 
eGFP* -expressing cardiomyocyte and the arrowheads show eGFP* 
non-myocytes. 
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Extended Data Figure 9 | Assessing cardiomyocyte differentiation markers 
from total non-myocytes in the heart. Adult cardiac interstitial cells isolated 
from a Kit "/“"* x R-GFP mouse were treated with dexamethasone for 1 week. 
Cells were then fixed and subjected to immunocytochemistry for the indicated 
antibodies. c-kit-lineage-derived cells were green (eEGFP*) and showed 

fluorescence in the cytosol and nucleus. The data show eGFP cells that express 
markers of differentiated cardiomyocytes such as o-actinin, troponin T and the 


Vimentin (white 


transcription factor GATA4 (all in red) but not the fibroblast marker vimentin 
(white), nuclei were stained blue (right panels). These results indicate that 
eGFP* Kit-Cre-expressing cells can generate pre-differentiated 
cardiomyocytes as well as non-eGFP interstitial cells; hence the cells identified 
by the Kit-Cre (knock-in) reporter strategy are representative of how 
endogenous c-kit” -expressing cells truly function. 
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Cepheid variables in the flared outer disk of our galaxy 


Michael W. Feast"?, John W. Menzies”, Noriyuki Matsunaga’ & Patricia A. Whitelock"? 


Flaring and warping of the disk of the Milky Way have been inferred 
from observations of atomic hydrogen’ but stars associated with 
flaring have not hitherto been reported. In the area beyond the Galac- 
tic centre the stars are largely hidden from view by dust, and the 
kinematic distances of the gas cannot be estimated. Thirty-two possible 
Cepheid stars (young pulsating variable stars) in the direction of the 
Galactic bulge were recently identified*. With their well-calibrated 
period-luminosity relationships, Cepheid stars are useful distance 
indicators*. When observations of these stars are made in two col- 
ours, so that their distance and reddening can be determined simul- 
taneously, the problems of dust obscuration are minimized. Here 
we report that five of the candidates are classical Cepheid stars. These 
five stars are distributed from approximately one to two kiloparsecs 
above and below the plane of the Galaxy, at radial distances of about 
13 to 22 kiloparsecs from the centre. The presence of these relatively 
young (less than 130 million years old) stars so far from the Galactic 
plane is puzzling, unless they are in the flared outer disk. If so, they 
may be associated with the outer molecular arm’. 

We derived the distances for the five Cepheids from near-infrared pho- 
tometry obtained with the Infrared Survey Facility (IRSF) and we used 
radial velocities from the Southern African Large Telescope (SALT) to 
determine the kinematics (see Methods)—both telescopes are at the 
South African Astronomical Observatory (SAAO), Sutherland, in South 
Africa. From these data we were able to ascertain the population to which 
the Cepheids belong. The other 27 Cepheid candidates are either better 
assigned to a different class (such as anomalous Cepheids) or else their 
classification as classical Cepheids is uncertain. 

Table 1 lists the derived distances and various other parameters for 
the Cepheids. They are at about the distance and position at which a stream 
associated with the Sagittarius (Sgr) dwarf galaxy crosses the plane’, but 
the low radial velocity (mean heliocentric radial velocity after correc- 
tion for the effects of stellar pulsation of Vp = 4+ 8kms_ l see Table 1) 
is completely different from that expected for members of the Sgr dwarf 
stream (about 150kms__')°” and the Cepheids are clearly Galactic. They 
cannot be in the Galactic bulge because their distances from the centre 
put them far beyond the bulge and the velocity dispersion of the five 
stars, 16+5kms ? (much of which is observational), is much smaller 
than expected for bulge objects (>60 km s_')®. Furthermore, these short- 
period Cepheids will be relatively young (about 100 million years (Myr) 
old), and, although there is a young component, including Cepheids”, 


Table 1 | Data for individual Cepheids 


OGLE | b D p R Ve p P(day) 
number (deg) (deg) (kpc) (kpc) (kpc) (kms~?) (kms~}) 
01 -0.03 294 247 1.3 16.2 -12 -3 2.598 
02 457 485 23.2 2.0 14.7 +31 50 2.026 
03 435 289 221 11 13.6 +5 24 1.236 
05 5.38 234 223 0.9 13.8 +7 28 3.796 
32 689 -389 304 —-2.1 22.0 —10 15 3.736 


OGLE numbers are prefixed by ‘OGLE-BLG-CEP-’./ and b are the Galactic coordinates. D is the distance 
from the Sun (D is uncertain to less than about 2 kpc), zis the distance from the Galactic plane and Ris 
the perpendicular distance from the axis of Galactic rotation (assuming the distance from the Sun to the 
Galactic centre is 8.5 kpc). Vz is the measured heliocentric radial velocity corrected for pulsation 
(+15 kms). pisthe radial velocity after correction for solar motion, Galactic rotation and the effects of 
stellar pulsation. P is the pulsation period. 


in the innermost regions of the bulge, the bulk of the population is old 
(about 10 billion years (Gyr) old)*. Figure 1 shows the positions of the 
five stars in comparison to catalogued Cepheids. The various sources 
of uncertainty for the distances of the Cepheids are discussed in the 
Methods, but the reddening law and reddening corrections presented 
the biggest challenge and are the primary contributors to the error bars 
shown in the figure. 

There is almost no information on gas or stars in the Galactic disk 
immediately behind (Galactic longitude] + 15°) the Galactic centre. The 
atomic hydrogen observations’ on either side of the centre, but away 
from the central region itself, suggest that the gaseous disk of the Milky 
Way at /~ 0 is not warped but shows a marked flaring at Galacto- 
centric radii (R, the distance from a star to the centre of the Galaxy) of 
15 kiloparsecs (kpc) and more; we note that the details are model depen- 
dent. The thickness of the gaseous disk’ increases from 60 parsecs (pc) 
half-width at half-maximum (HWHM) at R = 4 kpc to 2.7 kpc at R = 30 kpc 
and, especially at positive Galactic longitudes’, there is a marked increase 
from about 0.4 kpc at R = 15 kpc to about 1.0 kpc at R = 20 kpc. 

Therefore we found the Cepheids at exactly the distance predicted 
for this increase in disk thickness, as can be seen in Fig. 1. The absence 
of Cepheids nearer the Sun is consistent with the lower HWHM in these 
regions, whereas the absence of more distant Cepheids is partly due to 
the decreasing density at larger distances from the centre and partly the 
consequence of the Optical Gravitational Lensing Experiment (OGLE) 
observational cut-off. So the relatively narrow range of distances is 
consistent with our hypothesis that these stars are in the flared disk. In 
the Methods we also show that the numbers of Cepheids observed is 
consistent with expectations from a flared disk. 

Cepheids are usually associated with spiral arms and the distances of 
these five are similar to that expected for the far outer molecular spiral 
arm of the Galaxy” where it passes behind the central region of the Galaxy; 
the HWHM of this arm may be only about 0.6 kpc, in which case the 
Cepheids would be on its periphery. However, we note that distances 
and thickness computed for this arm depend sensitively on the model 
adopted and are therefore uncertain””. 

It is instructive to examine why the outer regions of a galactic disk 
flare. In the inner parts of a galactic disk the gravitational force k(z) at 
height z perpendicular to the galactic plane is dominated by the strong 
concentration of stars there. As we move to greater galactocentric radii, 
however, the concentration of stars drops dramatically, k(z) decreases 
and is increasingly dominated by the effects of dark matter. The flaring 
of the gas layer in the outer parts of our own and other galaxies has been 
attributed to this, and observations can in principle be used to study the 
distribution of dark matter in the halo of galaxies’’. Studies of the flar- 
ing of H 1 gas in our Galaxy’ suggest that in addition to an isothermal dark 
halo of 1.8 X 10’7M. where Mo is the mass of the Sun, there is a self- 
gravitating exponential dark-matter disk (1.8-2.4 X 10''M.) as well 
as a dark-matter ring (13 kpc < R< 18.5 kpc and 2.2-2.8 x 10'°M«), 
which may represent the remains of a cannibalized dwarf galaxy’. The 
most serious uncertainty in using gas as a tracer of the gravitational 
field arises from the need to adopt a model to derive the gas distribu- 
tion. It is therefore highly desirable that the gravitational field in the 
outer Galaxy be investigated using young stars for which good distance 


lAstrophysics, Cosmology and Gravity Centre, Astronomy Department, University of Cape Town, Rondebosch 7701, South Africa. @South African Astronomical Observatory, PO Box 9, Observatory 7935, 
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Figure 1 | Schematic of the Galaxy. The positions of the Cepheids (open 

circles with assumed maximum uncertainties of +0.2 mag) are compared to the 
location of the H1 gas. The solid and dashed curves are model fits, S and N1, 
respectively, from ref. 1 at three times the HWHM above and below the Galactic 
plane. We note that figures 1 and 2 of ref. 2 show the H1 flare in the relevant 


estimates can be made. Classical Cepheid variables are by far the best 
stars for this purpose. 

Studies of diffuse groups of B stars’*, which are even younger than 
Cepheids, are also consistent with a Galactic disk extending 15 kpc and 
20 kpc from the centre, at Galactic latitude b = —4° and —7°, respec- 
tively. These stars are in the third Galactic quadrant near the place where 
the warp forces the Galactic plane to its greatest negative displacement 
from b = 0°. So although these young stars are displaced from b = 0°, 
they are in the local Galactic plane, and therefore tell us nothing about 
a flare. 

The collection of stars now known as the ‘Monoceros ring’ has been 
interpreted as evidence for a warped disk”, or alternatively as the rem- 
nant of a dwarf galaxy cannibalized by the Milky Way™. It is perhaps 
curious that the Cepheids discussed above are at the distance from the 
Galactic centre that one would expect the Monoceros ring to be, ifindeed 
it were a complete circular ring around the Galaxy. The stellar popu- 
lation that makes up this so-called ring is generally considered to be old 
(>1 Gyr) and therefore different from the Cepheids (although there have 
been suggestions of an association with spiral arms’). Models’ indi- 
cate that the ages of the youngest Cepheids discussed here are less than 
130 Myr. The disputed origin of the Monoceros ring'® is beyond the 
scope of this Letter. Nevertheless, we note that simulations that suggest 
that the ring is a consequence of the interaction of the Sgr dwarf galaxy 
with the Milky Way” do not predict any significant density of stars in 
the ring at the distance of the Cepheids under discussion. 

Clearly, these Cepheids are just the tip of the iceberg. Further work 
on these stars and other ‘standard candles’ in the outer Galaxy will pre- 
sent new opportunities to probe the gravitational field and therefore the 
distribution of dark matter in the outer parts of our Galaxy. 


METHODS SUMMARY 


The Fourier coefficients listed for each light curve of the candidate Cepheids’ were 
compared with those of classical Cepheids in the Large Magellanic Cloud (LMC)"* 
to show that five of the stars with periods greater than one day fall clearly into the clas- 
sical Cepheid class; we can therefore derive their distances from their luminosities. 

The distances and the interstellar absorptions were derived together using pairs 
of colours (V and I or J and Ks). The results from the infrared magnitudes were 
adopted because the uncertainty due to interstellar reddening is significantly higher at 
shorter wavelengths. The detailed analysis indicates that the reddening law towards 
the Galactic centre is abnormal, as is well known". Various sources of uncertainty 
on the distance moduli are discussed in detail in the Methods, but the reddening 
law and the exact values of the reddening are the primary contributors, which lead 
to our estimate of the upper limit to the uncertainty of +0.2 magnitudes (mag). 
Radial velocities were determined by cross-correlation with a synthetic spectrum 
and the zero point of the velocity scale was confirmed by observation of two stars 
with known velocities. An approximate calculation can be made of the numbers of 


region extending up to about 2 kpc. The dark grey points are previously known 
Galactic Cepheids’ and the approximate regions surveyed by OGLE 

(2 < |b| <6) are shown in light grey on either side of the plane. The positions of 
the Sun and Galactic centre are indicated by the star symbols. 


Cepheids expected by extrapolating from the solar neighbourhood and assuming 
a scale length of 3 kpc within the plane. If the Cepheids in the flared disk have the 
same scale height as the gas (577 pc) then we would expect about 18 to exist above 
a height of 1 kpc from the plane in the direction surveyed by OGLE. Given the 
uncertainties, this is consistent with the five Cepheids that we do find, particularly 
as we do not expect our sample to be complete. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper 
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METHODS 


In the following, we describe how the stars were identified as classical Cepheids by 
comparison with similar stars in the LMC". We then go on to derive distances, taking 
into account the well known abnormal reddening law towards the Galactic centre””. 
Identifying classical Cepheids. A problem when studying Cepheids is that it is 
not always easy from available photometry to distinguish classical Cepheids (type I) 
from other objects, for example, anomalous or type II Cepheids (BL Her stars, W 
Vir stars). This issue has been discussed in the context of distant Cepheids towards 
the anti-centre” (that is, the direction in the Galaxy that is opposite from the centre, 
viewed from our perspective). In the interior region of the Galaxy, and particularly 
in the direction of the bulge, this is likely to be a significant problem. Fortunately it is 
possible to distinguish between some classes of stars using the Fourier coefficients of 
their light curves, and these are listed for the OGLE’ Cepheids towards the bulge. 
The main Fourier parameters for the J-band light curves of the five stars discussed 
here (Extended Data Table 1) can be compared with plots of the Fourier coefficient 
ratios Ro), R3; and phase differences #2), 93; (where the subscripts denote the order 
of the cosine curve fit) against period for various classes of variable star in the LMC’* 
and this enables us to classify these five securely as classical Cepheids. Other pos- 
sible Cepheids in the OGLE bulge catalogue have characteristics that suggest they 
belong to the anomalous Cepheid class, are possible type II Cepheids or else their 
classification is doubtful. 
Photometry. The infrared photometry (Extended Data Table 2) was carried out 
using the 1.4-m IRSF and the SIRIUS camera at Sutherland”’. Each of the targets 
was observed once on 2012 May 6 (Universal time, UT) with an exposure time of 
25s (5s times five exposures at dithered positions). The photometry was extracted 
using the Image Reduction and Analysis Facility (IRAF) package DAOPHOT (http:// 
irafnoao.edu) and standardized by comparison with nearby stars from the 2MASS 
point source catalogue”. The uncertainties for the brightest and faintest of the 
Cepheids range from 0.02-0.07 mag at J, 0.02-0.03 mag at H and 0.02-0.04 mag at 
Kg, respectively. These are significantly less than the uncertainties on the 2MASS 
measures, where they exist, for the same sources. We use these single-epoch J, H 
and Ks measurements to estimate the distance, noting that the near-infrared ampli- 
tudes of these short-period stars will be small (<0.1 mag; ref. 23). 
Distances and interstellar absorptions. In general there are severe problems in 
dealing with observations of distant stars in the Galactic plane close to or beyond 
the centre because of the large and uncertain amounts of interstellar extinction in 
these directions. Cepheids offer an important advantage in this regard in that dis- 
tances can be derived from relations that allow the reddening and the distance to 
be determined together and unambiguously when observations in two colours are 
available—for example, V and J or Jand Ks—provided the reddening law is known. 
Recent work’’ has indicated that the law of reddening is different towards the 
Galactic bulge from that adopted elsewhere” and here we use: 


Ax = (0.494 +0.006) Ey — x) 


from ref. 19 and 


Ay = (1.125 +0.09)Ew—1 


which was found by the same method”*. It should be noted that the relation in V 
and I may be somewhat more complex than the one given”. 

Adopting period-luminosity relations in V and I (as derived’® by the OGLE 
group) and J and Ks (derived’’ for Cepheids with 0.4 < logP < 1.0) from the LMC 
together with an LMC distance modulus of 18.5 mag and interstellar extinction 
values** of Ay = 0.22 mag, A; = 0.13 mag, Ay = 0.06 mag and Ax = 0.02 mag for 
the LMC direction, we then have the distance modulus fly for a Cepheid with a 
pulsation period P: 


My + Ay = V +2.762logP + 1.190 (1) 
Hy + Ay =I1+2.959l0gP + 1.751 (2) 
and 
Ho + Ay =J +3.138logP +2.109 (3) 
Lp +-Ax =Ks+3.284logP + 2.383 (4) 


Combining these pairs of equations with the reddening law given above leads to the 
two estimates of the distance modulus, (ig); and ({49)jx—derived, respectively, 
from equations (1) and (2) and equations (3) and (4)—and the interstellar extinc- 
tion corrections, A; and Ax (Extended Data Table 2). 

Uncertainties in the distances. The LMC period-luminosity relations that we used 
are well defined‘. Their absolute calibration is based on the LMC distance modulus, 
which has been determined in a number of ways. The uncertainty in the adopted 
value (about 0.04 mag or 2%) is negligible for our discussion. The mean OGLE VI 
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magnitudes derived from their extensive observations have negligible error. 
Because of the small pulsation amplitudes of the Cepheids in the infrared region 
of the spectrum, the error on our J and Ks magnitudes is =0.05 mag. 

Possible metallicity effects on the period-luminosity relations have been much 
discussed*. Nothing is known about the metallicities of Cepheids behind the Galactic 
centre, but those of Cepheids in the outer disk of the Galaxy in the general direction 
of the anti-centre”’, and at comparable distances from the centre to those discussed 
here, have a mean logarithmic iron-to-hydrogen ratio [Fe/H] = —0.60 + 0.12, which 
is intermediate between those of the LMC and the Small Magellanic Cloud”. 

The difference in distance between the Small Magellanic Cloud and the LMC 
derived from J and Ks observations of Cepheids*’ agrees with values measured in 
other ways, without the application of metallicity corrections. Furthermore, Hubble 
Space Telescope parallaxes of Galactic Cepheids (with [Fe/H] ~ 0) agree with the 
LMC modulus adopted without the application of any metallicity corrections (they 
give 18.52 + 0.03 mag from V and J and 18.47 + 0.03 mag from Ks). The various 
factors indicate that any residual metallicity effects on the distances derived for 
these Cepheids will be very small. 

A potential source of uncertainty is in the width of the period-luminosity rela- 
tions. This width is due to the fact that, at a given period, a Cepheid brighter than 
the average is also bluer. This leads to the smaller-than-average V being compensated 
by a lower-than-average derived apparent absorption. It is clear that the uncertainty 
in the modulus due to the spread in colour at a given period is*: 


(to) =(B, — Baowv—no (5) 


where /3; is the colour coefficient of a (nearly dispersionless) period—luminosity- 
colour relation in (V, I) and 3, is the ratio of total to selective absorption. For the 
Cardelli* law of reddening, which is often used, B, ~ $2. Thus, any uncertainty due 
to the width of the period—luminosity relation in our case comes from the change 
in > for the bulge, which is 0.33. The scatter in V — I at a given period” is 0.08, 
which would result in o(#9) = 0.03 in equation (5). In the infrared, the widths of the 
period-luminosity relations are lower and will not introduce significant uncertainty. 

Interstellar reddening is a source of error and, as pointed out above, the evidence 
points to an abnormal reddening law in the direction of these stars. The uncer- 
tainty in this reddening law in JKs is small; this, together with the low extinction in 
the infrared, leads to only a small uncertainty in the distance modulus (0.003 mag 
for the most heavily reddened star). Even if, contrary to the evidence, we used the 
Cardelli* law of reddening, the change in distance moduli would not affect our 
conclusions. In that case, the modulus of our most reddened star would decrease 
by 0.28 mag (a change of distance from 24.4 kpc to 21.4 kpc) and the moduli of the 
other stars would decrease by an average of 0.12 mag (1.25 kpc). Owing to the greater 
absorption in V and J and the greater uncertainty in the reddening law, the uncer- 
tainties in the derived distances are greater. The uncertainty in the reddening 
coefficient leads to an uncertainty of 0.25 mag in the modulus of the most heavily 
reddened star and a mean of 0.12 mag in the other cases. If, contrary to expecta- 
tions, a Cardelli reddening law had been adopted, the modulus of the most heavily 
absorbed star would have decreased by 0.90 mag and those of the others by a mean 
of 0.42 mag. Clearly, reddening uncertainties in VI are much more important than 
in JK. 
Summary of adopted distances and their uncertainties. In the main paper we 
adopt the distances derived from the J and Ks magnitudes (see Table 1), because 
they are the more accurate values. The above discussion indicates that the errors in 
those distance moduli are: 0.04 mag from the absolute calibration; =0.05 mag due 
to the pulsation amplitude; and negligible amounts from the period—luminosity 
relation width, metallicity effects and uncertainties in the Nishiyama reddening law. If 
the Cardelli reddening law were applied to these stars their moduli would be reduced 
by a mean of 0.15 mag, but a change this large seems to be ruled out by observa- 
tions. The systematic uncertainties overwhelm the rather small statistical errors, so 
we do not attempt to assign individual errors to distances. We consider 0.2 mag to 
bea very conservative estimate of the total error of an individual modulus (random 
plus systematic) and this is what is illustrated in Fig. 1, but we fully expect the errors 
to be less than this. In the case of the moduli from V and J, the main uncertainty is 
from the coefficient of the reddening law and complications in deriving this have 
been noted’’. We simply mention here that with the adopted law the VI moduli are 
0.20 mag larger than the JK; values adopted, whereas with a Cardelli law they are 
0.31 mag smaller, suggesting that a less extreme variation from the Cardelli law 
applies to these stars. 
Radial velocities. Our spectra (Extended Data Table 3) were obtained with the 
Robert Stobie spectrograph on the SALT. A volume phase holographic grating 
of 1,300 lines mm! was used to cover the wavelength range 7,800-9,600 A, putting 
the Ca! triplet on the middle charge-coupled device of the detector. The resolution 
is 3.4A with a projected slit width of 1.5 arcsec. 

Radial velocities were obtained by cross-correlation of the spectra with a syn- 
thetic spectrum taken from the library assembled for the RAVE experiment™*. Two 
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stars with known radial velocities were used as a check on the radial velocity zero 
point. The measured velocities for these two stars are 34.7 kms land -12.0kms_1, 
respectively. Mean radial velocity errors due to photon statistics are 10kms_', so 
the radial velocity zero point seems to be secure. The measured heliocentric velo- 
cities have been corrected for stellar pulsation adopting a standard velocity curve 
for short-period Cepheids with a semi-amplitude of 20 kms" ' (for example, figure 
6 of ref. 36). The mean heliocentric velocity after correction is 4+ 8kms_' com- 
pared with 13 + 7kms_ ' before correction. This indicates that uncertainties in the 
correction will not affect our conclusion regarding the mean radial outward velo- 
city of this group of stars and that the error given in the main text is realistic. 
Galactic structure. In the main paper and in the following we adopt a distance 
from the Sun to the Galactic centre of 8.5 kpc and a flat rotation curve with a velo- 
city 0 = 220kms _', to allow for a direct comparison with models describing the 
H gas behaviour in the outer Galactic disk’. Plausible changes” in these values will 
not affect our conclusions. 

The heliocentric distances of the Cepheids (D values in Table 1) are comparable 
with that of the Sgr dwarf galaxy (about 24 kpc), anda tidal stream from this system 
crosses the Galactic plane, behind the Galactic centre, close to the Galactic bulge at 
positive Galactic longitude. RR Lyrae variables belonging to this stream have recently 
been found in our field** at a distance of around 27 kpc. The possibility that the Sgr 
system contains stars as young as about 100 Myr has also been raised”. This is the 
expected age of short-period classical Cepheids’*, so we cannot rule out the pos- 
sibility that our stars belong to the Sgr system on the basis of photometry alone, and 
kinematic information is essential. Because the heliocentric radial velocities of Sgr 
dwarf members are about 150kms_' (refs 6 and 7), it is clear from the velocities in 
Table 1 that our Cepheids belong instead to the far outer parts of the Galactic disk. 

The possible association of the Cepheids with the far outer molecular spiral arm° 
was raised in the main paper. This arm lies at positive Galactic longitudes (in the first 
quadrant). At / = 13°.25, the lowest longitude at which the carbon monoxide (CO) 
was measured, the estimated distance’ is D = 23 kpc, corresponding to R = 14.5 kpc; 
the exact distances are sensitive to the kinematic model. These are somewhat less 
than the Dand R values in Table 1. Our values (in Table 1) of course refer to a region 
where there is no information from the gas. Adopting an alternative kinematic model’ 
with elliptical orbits will lead to larger derived distances of the gas. 

The five Cepheids are concentrated in a relatively small part of the area covered 
by the OGLE survey at positive longitudes. It is possible that variable interstellar 
absorption over the field could account for this. However, it seems more likely that 
it is due to real clumping, such as is common for young objects in spiral arms. The 
far outer molecular arm has not yet been seen emerging from the Galactic centre 
region at negative longitudes. 

We note that our stars are at Galactocentric distances comparable to, but greater 
than, those of a small number of masers defining an outer arm in the general direc- 
tion of the anti-centre*°. The Cepheids whose radial velocities were studied in the 
general direction of the anti-centre are also at somewhat shorter distances (mean 
R= 12.9kpc)”. 

The Sun-Cepheid-Galactic centre angle is small for all of our stars (0° to 2.7°, as 
measured in the Galactic plane). Thus, the corrected radial velocity p primarily mea- 
sures motion that is radial from the Galactic centre and the five Cepheids give a mean 
p =23+9kms ‘, indicating a significant mean outward radial motion. This would 
not be inconsistent with a value of 9kms_' predicted in the model’. It should also 
be noted that in the general solar neighbourhood systematic deviations from cir- 
cular motion of about 10 kms! are known for OB stars and Cepheids in regions 
of around a kiloparsec radius*'. Therefore, our result does not necessarily imply 
any significant general deviation from circular orbits. The uncertainty in this con- 
clusion is related to the small number of objects involved rather than to the uncer- 
tainties in estimating mean Cepheid velocities. 

In the general anti-centre region at somewhat smaller Galactocentric distances 

no evidence was found” for a general outward velocity, though curiously the three 
Cepheids in that study” with Galactic latitudes of |Al| < 10° from the anti-centre 
have a mean positive radial velocity of 10 + 4kms '. Despite the small number of 
our stars the result would be in conflict with an outward motion of the local stan- 
dard of rest of about 14kms’ * which has sometimes been suggested” to explain 
the Galactic asymmetry of the H1 velocities. 
Number of Cepheids observed and expected. With such a small number of 
Cepheids in our sample it is impossible to carry outa detailed study of the space dis- 
tribution at their distance from the Galactic centre. However, the following rough 
calculation shows that their presence far from the Galactic plane requires the pre- 
sence of a flared disk. 

Consideration of the number of Cepheids in the solar neighbourhood suggests 
that the expected number of such stars in a column perpendicular to the Galactic 


plane and with a cross-sectional area of one square kiloparsec is about 60. With a 
scale height of 86 pc, as for the gas* (HWHM 60 pc) and taking into account that 
the area on the Galactic plane, between D = 20 kpc and D = 30 kpc, covered by the 
OGLE survey of (slightly less than) 44 kpc”, the number of Cepheids expected in 
that survey with z > 1 kpc is approximately 0.008. This is for solar neighbourhood 
densities. With a disk scale length of 3 kpc, the drop in density from 8.5 kpc to 
15 kpc is a factor of 9 and the expected number of Cepheids is about 0.001 (that is, 
for an unflared disk Cepheids are not expected). However, at the distance of our 
Cepheids the scale height of the gas is about 577 pc (HWHM 400 pc) and if the 
Cepheids follow the gas we predict the existence of about 18 in the relevant region. 
This calculation is obviously quite uncertain, but it is sufficient to show that our 
conclusion that these Cepheids are in the outer regions ofa flared disk of scale height 
similar to that of the gas is entirely plausible. We see no other satisfactory explana- 
tion for these stars. We cannot rule out the possibility that a few more of the OGLE 
variables are classical Cepheids. 

Owing to the small numbers, the likely effects of non-uniform interstellar absorp- 
tion and the fact that young objects are expected to be found in groups rather than 
uniformly distributed over the field, it is not feasible to draw any strong conclusion 
from the fact that these Cepheids are confined to the positive longitude side of the 
OGLE field or that four of the five stars are at northern latitudes, despite the fewer 
OGLE fields there. 
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Extended Data Table 1 | Fourier coefficients for the /-band light curves of the Cepheids 


OGLE # 
a ee 0.488 0.302 2.807 
0.307 4.263 2.444 


0.092 0.099 3.603 1.706 
0.443 0.212 3.242 
0.436 0.232 2.791 


The amplitude (A) ratios R,1 =A,/A, and phase differences gn1 = yp — Ng; are listed, where A,, and g, are parameters of the truncated Fourier series fitted to the photometric data*®. The subscripts refer to the 
order of the fit, so that n = 1 is the fundamental, n = 2 is the first harmonic and so on. P is the pulsation period in days. 
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Extended Data Table 2 | Photometry of the Cepheids 
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For each star the OGLE mean Vand / magnitudes and the single epoch IRSF J, Hand Ks magnitudes for observations made on the given Julian date (JD) are listed. (119), is the reddening-corrected distance modulus 
calculated from equations (1) and (2) and (9) « is the reddening-corrected distance modulus calculated from equations (3) and (4). A, and A, are the interstellar extinction values at / and Ks respectively, calculated 
simultaneously with the distance moduli. 
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Extended Data Table 3 | Journal of spectroscopic observations 


OGLE-BLG-CEP-32 | 6498.47404 0.727 


3401088 
3349465 


For each object named, the heliocentric Julian date (HJD) when the spectrum was obtained and the phase of the Cepheid variations is listed. For the two reference stars, the catalogue radial velocities (Vp) are given. 
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Crucial to many light-driven processes in transition metal complexes 
is the absorption and dissipation of energy by 3d electrons’ *. But a 
detailed understanding of such non-equilibrium excited-state dy- 
namics and their interplay with structural changes is challenging: a 
multitude of excited states and possible transitions result in phenom- 
ena too complex to unravel when faced with the indirect sensitivity 
of optical spectroscopy to spin dynamics’ and the flux limitations of 
ultrafast X-ray sources®’. Such a situation exists for archetypal poly- 
pyridyl iron complexes, such as [Fe(2,2'-bipyridine)3]”*, where the 
excited-state charge and spin dynamics involved in the transition from 
a low- to a high-spin state (spin crossover) have long been a source of 
interest and controversy*'°. Here we demonstrate that femtosecond 
resolution X-ray fluorescence spectroscopy, with its sensitivity to spin 
state, can elucidate the spin crossover dynamics of [Fe(2,2’-bipyridine)3]°* 
on photoinduced metal-to-ligand charge transfer excitation. We are 
able to track the charge and spin dynamics, and establish the critical 
role of intermediate spin states in the crossover mechanism. We an- 
ticipate that these capabilities will make our method a valuable tool 
for mapping in unprecedented detail the fundamental electronic excited- 
state dynamics that underpin many useful light-triggered molecular 
phenomena involving 3d transition metal complexes. 

The femtosecond duration of the intense hard X-ray pulses generated 
by the LCLS (Linac Coherent Light Source) X-ray free-electron laser'®”” 
creates the opportunity to study spin dynamics with iron 3p-1s (KB) X-ray 
fluorescence spectroscopy’*”’. Figure 1 shows diagrams of the measure- 
ment technique and relevant energy levels (Fig. 1-c), a ‘ball-and-stick’ 
representation of the [Fe(2,2’-bipyridine)s] ai complex (Fig. 1d), and the 
dependence of photoexcited spin crossover dynamics on the Fe-ligand 
distance (Fig. le). Given the roughly 100 femtosecond (fs) time resolu- 
tion of the measurement’, the subfemtosecond lifetime of the iron 1s 
core hole makes X-ray fluorescence an effectively instantaneous probe”. 
A variety of distinct electronic excited states, including singlet and triplet 
metal-to-ligand charge transfer states (MLCT), triplet ligand field excited 
states (*T) and quintet ligand field excited states (°T,) have been proposed 
to participate in the spin crossover mechanism**”°"'”'? (Fig. le). Distin- 
guishing electronic excited states with different charge and spin density, 
suchas the >MLCT, °T and Ts states listed above, represents a critical step 
in characterizing the spin crossover mechanism. 

Figure 2a shows the sensitivity of the iron Kf fluorescence spectrum 
to the 3d spin moment, a sensitivity that results from the exchange 
interaction between the 3p and 3d electrons'*°***, Equally important, 
the ground-state spectra of iron coordination complexes with different 
ligation, but the same iron spin moment, exhibit similar KB fluorescence 
spectra. This insensitivity of KB fluorescence spectroscopy to the details 
of the coordinating ligands and the local symmetry of the complex has 


previously been used to characterize the electronic ground-state spin 
moment of a variety of molecular systems’”**. We note that the insens- 
itivity of the KB fluorescence spectrum to the electronic properties of the 
ligand means that the spectrum cannot be used to distinguish between 
singlet and triplet MLCT states. We utilize these spectra of distinct spin 
configurations to model transient difference spectra—that is, the time 
and energy dependence of the fluorescent amplitude difference between 
excited-state and ground-state spectra. Figure 2b shows the model com- 
plex difference spectra generated from the ground-state spectra of the 
relevant excited-state spin configurations and the singlet ground state. 
These model complex difference spectra confirm that each excited-state 
spin moment generates a distinct difference spectrum that cannot be re- 
produced by a linear combination of the other difference spectra (see 
Fig. 2, Extended Data Fig. 1 and Methods for details). 

The time-resolved Kf fluorescence spectra provide the sensitivity to 
spin dynamics needed to answer a critical question regarding the spin 
crossover mechanism: does the °T;, state form directly from the > MLCT 
state*'*?®, or does spin crossover involve a 3T transient®”°? Ultraviolet- 
visible transient absorption'*"’, time-resolved luminescence’’, and time- 
resolved iron K-edge XANES° have been used to characterize the spin 
crossover dynamics of [Fe(2,2’-bipyridine) 3]°*, and the similar rates mea- 
sured for *MLCT decay and °T; formation were attributed to the *MLCT 
excited state converting directly to the °T excited state, although a con- 
version including transient triplet states was also considered®. Potential 
energy surfaces calculated for this system allow either mechanism to 
proceed with minimal reaction barriers*’”, but cannot explain why the 
"MLCT and °T; states should be strongly coupled: the leading order 
spin-orbit interaction cannot couple the MLCT and °T, states because 
a transition between these states requires the excitation of two electrons 
on two distinct centres, whereas spin-orbit coupling is predominantly a 
single-centre, one-electron operator”. 

Figure 2c, d shows the transient difference spectra for [Fe(2,2’- 
bipyridine)3]?* measured for a 50-fs anda 1-ps (picosecond) time delay. 
The spectrum in Fig. 2d clearly demonstrates the ease of identifying the 
°T,, state with the KB fluorescence spectrum. Determining whether spin 
crossover from the '*MLCT to the °T, proceeds through a transient °T 
state proves more challenging because the relaxation dynamics do not 
lead to a time regime where the majority of the excited molecules reside 
in the °T excited state. The significant difference between the spectra in 
Fig. 2c and d, however, clearly demonstrates the presence of excited-state 
species other than the °T, state. With statistically rigorous kinetic mod- 
elling, **MLCT, *T and °T, states can be clearly distinguished in the relaxa- 
tion dynamics probed with Kf fluorescence. 

The ability to spectroscopically distinguish between '*MLCT, °T 
and °T, electronic excited states allows the spin crossover mechanism 
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to be determined from the time evolution of the iron KB fluorescence 
spectrum. The time-resolved difference spectra, model fits of the dif- 
ference spectra, and the parameters extracted from the fit can be found 
in Fig. 3, Extended Data Figs 2-4 and Extended Data Table 1. We have 
fitted the difference spectra to two distinct models: one where the **MLCT 
decays directly to a°T’ excited state and one where the '*MLCT relaxes 
to the °T; state via a °T transient. Figure 3b, c shows the time-dependent 
difference signal measured at two X-ray fluorescence energies: 7,061 eV, 
where the difference signal is largest, and 7,054 eV, where the triplet 


Figure 1 | Schematic depiction of ultrafast X-ray 
fluorescence detection of spin crossover 
dynamics. a, Experimental set-up involving liquid 
jet for sample replenishment, optical laser pump, 
and 8-keV X-ray beam for generating X-ray 
fluorescence measured with a dispersive crystal 
spectrometer. b, Energy level diagram for KB 
fluorescence involving photo-ionization of a 1s 
electron and X-ray fluorescence originating from 
the transition of a 3p electron to the 1s hole. 

c, Schematic diagram of how the spin crossover 
dynamics influence the time-dependent KB 
fluorescence difference spectra. d, Molecular 
structure of [Fe(2,2' -bipyridine)3]°* (red, Fe atom; 
blue, N; grey, C; H not shown). e, A schematic 
drawing of the potential energy surfaces involved in 
the spin crossover dynamics. 


R(Fe-L) 


model complex has a spectral signature clearly distinct from the > MLCT 
and °T, states as shown in Fig. 2b. The fits in Fig. 3b, c have been deter- 
mined from a global analysis of the full time-dependent spectra. The 
statistical significance of the more complex kinetic model involving the 
triplet transient can be determined from an F-test comparison of the two 
models (described in Methods). The reduction in residuals achieved 
with the model containing the triplet transient is sufficient to reject the 
direct '*MLCT->°T> model with greater than 95% confidence. Note 
that the successful use of a kinetic model to describe subpicosecond 
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7 Figure 2 | Spin-dependent iron Kf fluorescence 
spectra. a, The Kf fluorescence spectra of ground- 
state iron complexes with different spin moments: 
singlet ([Fe(2,2'-bipyridine)3]?*, red), doublet 
([Fe(2,2'-bipyridine)3]° *, blue), triplet (iron(m) 
phthalocyanine, green), quartet (iron(1) 
phthalocyanine chloride, red dashed), and quintet 
([Fe(phenanthroline)2(NCS).], blue dashed). 

b, Model complex difference spectra for the 
13MLCT, °T and °T,, excited states constructed by 
subtracting the singlet model complex spectrum 
from the doublet, triplet and quintet model 
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Figure 3 | Time-dependent photo-induced iron KB difference spectra and 
kinetic modelling of spin crossover dynamics. a, Time-dependent optically- 
induced two-dimensional Kf fluorescence difference spectra for [Fe(2,2’- 
bipyridine)3]**. b, c, The difference signal measured at a Kf fluorescence 
energy of 7,061 eV (b) and 7,054 eV (c) for [Fe(2,2’-bipyridine)3]”* (red stars), 
as well as the best fit achieved for kinetic models with (blue) or without (green 
dashed) a “Tis transient. The error bars in b and ¢ reflect the standard error for 
the difference signal determined from six independent measurements. 


dynamics implies that the KB spectra do not depend significantly on 
the time-evolving nuclear structure, consistent with the insensitivity 
of the ground-state Kf spectra to the ligand details. 

The successful analysis of the experimental data relies on two con- 
straints presented by the model spectra shown in Fig. 2b and two con- 
straints derived from the kinetic models. We force (1) the shape and (2) the 
relative amplitudes of the difference signals for the **MLCT, *T and 
°T; electronic excited states to match the shape and relative amplitudes 
of the model complex difference spectra. We also require (3) all X-ray fluo- 
rescence energies to be fitted with a single time zero and (4) all MLCT excited 
states to undergo spin crossover, consistent with previous measurements 
of the spin crossover quantum yield’. The ultrafast rise of the difference 
signal shown in Fig. 3b greatly constrains the value of time zero and the final 
°T, state population. For the fit to the direct spin crossover mechanism 
shown in Fig. 3b, the fast rise in signal at 7,061 eV requires a fast rise in 
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°T, population. As shown in Fig. 3c, the fast rise in the direct mechanism 
fit at 7,061 eV also leads to a fast drop in signal at 7,054 eV, because the 
°T, state has a negative difference signal at 7,054 eV. For the fit to the 
sequential spin crossover mechanism also shown in Fig. 3b, the fast rise in 
signal at 7,061 eV can be accommodated initially by a rise in *T popu- 
lation. Because the *T state does not have a negative difference signal at 
7,054 eV, the fast rise in °T population does not lead to a fast drop at 
7,054 eV. The stepwise transition through the *T leads to a delayed onset 
of the drop in fluorescence amplitude at 7,054 eV relative to the rise in 
signal at 7,061 eV, consistent with the experimental data. For the direct 
model, a shift in time zero to fit the data in Fig. 3c would lead to a poor fit 
of the data in Fig. 3b. 

Relaxation to the °T’, excited state via a °T transient provides a more 
satisfying explanation for the relaxation dynamics. We speculate that 
the sequential relaxation occurs more promptly than the direct cross- 
over from the '’MCLT to the °T; excited state because the sequential 
transition involves single electronic transitions coupled by a spin-orbit 
operator, whereas the direct transition involves the simultaneous trans- 
ition of two distinct electrons on two centres and cannot occur with the 
first-order spin-orbit operator. The sequential relaxation, like the direct 
transition, provides an energetically feasible pathway with minimal reac- 
tion barriers between states that can be coupled with standard spin-orbit 
interactions”. The spin-orbit matrix elements in conjunction with the 
calculated potential energies of a variety of electronic excited states of 
[Fe(2,2’-bipyridine)s] ?* asa function of the metal-ligand bond distance 
provide an approximate explanation for the fast intersystem crossing 
and the extremely short lifetime of the *T excited state. A diagram of these 
potential energy surfaces can be found in Fig. le. In principle, the triplet 
ligand field excited state could be either a 3T, ora°T) state. Computations 
indicate a crossing of the *T, state in the Franck-Condon region of the 
'3MLCT excited state and that the *>MLCT->°T,—°T, pathway dom- 
inates’’; however, relaxation trajectories involving the 1, ligand field 
excited state remain plausible, and more definite conclusions will require 
a more complete calculation of the multidimensional potential energy 
surfaces, including the potentially important role of metal-ligand tor- 
sional motion”. The sequential model fit in Fig. 3 gives a 150 + 50 fs time 
constant for ‘MLCT decay to the *T state and a 70 + 30 fs time constant 
for °T decay to the °T; state. Although the mechanistic conclusions we 
have drawn from our measurements differ from the earlier interpreta- 
tion”, our experimental findings do not contradict the earlier results, but 
rather expand on them. The extracted decay time for the 'MLCT excited 
state and the effective rise time for the °T excited state agree with the time 
constants observed previously within experimental error’. The similarity 
of the ’*MLCT decay time and the °T; rise time results from the rate of °T 
decay being greater than that of *T formation. This inhibits the build-up 
of molecules in the *T excited state and challenges the temporal differ- 
entiation of the distinct electronic states involved in spin crossover (see 
Extended Data Fig. 2d). Only with a technique highly sensitive to the iron 
spin multiplicity can the presence of the *T transient excited state in the 
relaxation dynamics be robustly resolved. 

The complex excited-state electronic structure of molecules containing 
transition metals has inhibited the unambiguous interpretation of exper- 
imental measurements and the development of excited-state quantum 
dynamics simulations. We have demonstrated here that ultrafast X-ray 
fluorescence spectroscopy enables robust measurements of the charge and 
spin dynamics integral to excited-state relaxation in 3d transition-metal 
coordination complexes, which represents an important step towards an 
incisive mechanistic understanding of excited-state dynamics in 3d trans- 
ition metal complexes. 


METHODS SUMMARY 


We performed femtosecond hard X-ray fluorescence measurements on a 50 mM 
solution of electronically excited [Fe(2,2'-bipyridine);]°* in water at the XPP instru- 
ment at the LCLS. The experiment used a 0.1-mm-thick planar liquid jet oriented at 
45° relative to the direction of the incident X-ray beam. The sample solution was 
collinearly excited with a 70-fs FWHM 520-nm laser beam. The absorption spectrum 
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and laser power dependence can be found in Extended Data Fig. 5. A cylindrically 
bent energy dispersive X-ray emission spectrometer and a 2D pixel array detector 
(PAD) were used to capture the iron 3p-1s (KB) fluorescence. The PAD response 
calibration involved a pixel-dependent dark current subtraction, a common mode 
off-set, and an experimentally determined gain correction. The final KB fluorescence 
spectrum for each time-step was obtained by integrating the signal in the non-dispersive 
direction. The shot-to-shot X-ray-optical relative time of arrival fluctuations were 
measured with a timing diagnostic and used to sort each shot by its relative time of 
arrival. We measured the Kf fluorescence spectra of a series of iron model complexes 
with different spin states at beamline 6-2 of SSRL. We have used electronic ground- 
state spectra and kinetic models, with and without triplet transients, to analyse the 
time evolution of the Kf fluorescence spectra. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Experimental procedures. We performed femtosecond hard X-ray fluorescence 
measurements on a 50mM solution of [Fe(2,2'-bipyridine)3]** in water at the 
X-ray pump-probe (XPP) instrument at the Linac Coherent Light Source (LCLS). 
The experiment used a 0.1 mm thick planar liquid jet oriented at an angle of 45° 
with respect to the direction of the incident X-ray beam. We measured the ultra- 
violet-visible absorption spectrum of the solution before and after the measure- 
ment to ensure no appreciable sample damage had occurred. The sample solution 
was collinearly excited with a 70 fs FWHM 520 nm laser beam (120 mJ cm”) gen- 
erated by optical parametric amplification of the 800 nm output of a Ti:sapphire 
regenerative amplifier laser system (Coherent, Legend). With 520 nm light, we excited 
[Fe(2,2 ’-bipyridine)3]?* at the peak of the MLCT band (Extended Data Fig. 5a). We 
set the pump laser fluence to maximize excitation yield, while avoiding other dele- 
terious photophysical phenomena. Previous time-resolved hard X-ray spectroscopy 
measurements of iron spin crossover compounds have used higher, often signifi- 
cantly higher, optical laser fluence” *!. We used an excitation laser fluence where the 
transient optical signal changes linearly with pump fluence, as shown in Extended 
Data Fig. 5b. The 8 keV X-ray laser pulses, with an average bandwidth of 0.3%, were 
focused using Be compound refractive lenses to a 50 pm diameter spot size at the 
sample position. Shot-to-shot fluctuations in the X-ray incidence energy and band 
width do not influence the X-ray fluorescence spectrum when the X-ray energy is well 
above the core ionization threshold. For iron, with a ls ionization threshold of 
7.112 keV, the 8keV X-ray energy used in the experiment achieves this goal. 

The incoming X-ray pulse energy was measured using non-invasive diagnostics 
before the sample*’. A high-resolution energy dispersive X-ray emission spectro- 
meter®’, based on the von Hamos geometry, was used to capture the iron 3p-1s 
(KB) fluorescence. The spectrometer was equipped with 4 cylindrically bent (0.5 m 
radius) Ge(620) crystal analysers and set to cover a Bragg angle range from 78.0° to 
80.4°. The CSPAD 2D pixel array detector (388 X 370 pixels)** intersected the X-rays 
diffracted from the crystal analysers in an energy range from 7,033 to 7,084 eV. 

The detector response calibration involved a pixel dependent dark current (ped- 
estal) subtraction, a common mode offset, and an experimentally determined gain 
map. The gain map was built from histograms of each pixel response extracted from 
multiple images (after dark current and common mode offset corrections) collected 
over many minutes. Gaussians were fitted to the zero and one photon peaks of the 
histograms, enabling fine-tuned dark and gain corrections to the histograms directly 
from the data. The zero photon peaks were centred at zero analogue-to-digital units 
and the separation between the zero and one photon peaks were scaled to unity for all 
pixels. The counts for each pixel in a given time-step were obtained by averaging the 
analogue-to-digital values above a threshold of 2.50 of the zero-photon peak and 
scaling to the incident X-ray intensity. The final 1D spectrum for each time-step was 
obtained by integrating the signal in the non-dispersive direction”. 

The shot-to-shot X-ray—optical relative time of arrival fluctuations were measured 
for every X-ray-optical pulse pair with a timing diagnostic tool based on optical 
detection of X-ray generated carriers in a Si;N, thin film. A description of the time 
diagnostic tool and the demonstrated performance of the tool can be found else- 
where'”**. This experimental measure of the relative timing can be used to sort each 
experimental shot by the relative time of arrival. Although the timing tool provides 
an accurate measure of the shot-to-shot variation in the relative time of arrival between 
the X-ray and optical laser pulses, it does not provide an accurate measure of the 
instrument response function. The timing tool uses changes in the Si;N, dielectric 
function to modify the transmission of a chirped white light pulse through the Si;N4 
thin film. These changes in the dielectric function result from the increase in free 
carriers generated by X-ray ionization, Auger relaxation and impact ionization. The 
temporal response is the convolution of these complex dynamics with the cross- 
correlation of the X-ray and optical laser pulses. Without a detailed model of the 
carrier generation, the cross-correlation cannot be extracted from the timing tool. At 
present, no experimental means of cross-correlating the hard X-ray and optical pulses 
has been demonstrated. 

The final time resolution of the experiment results from the convolution of the 

optical and X-ray pulse durations, the group velocity walk-off of the X-ray and 
optical pulses in the sample and the error in the relative time of arrival measure- 
ment. These factors would predict a cross-correlation of roughly 150 fs FWHM. In 
the data analysis, the instrument response function FWHM and time zero (coin- 
cident arrival of the X-ray and optical pulses) are fit parameters. 
Kf fluorescence spectra for model complexes. The 3p-1s X-ray (Kf) fluorescence 
spectra of model complexes play an important role in our analysis of the time-dependent 
data. The Kf fluorescence spectra of 3d transition-metal ions reflect the 3p—3d ex- 
change interaction, which makes the line shapes sensitive to the spin state of the 
transition metal atom'?”****°*”. KB fluorescence provides a powerful technique for 
spin state studies, particularly when there are advantages of working with penetrat- 
ing hard X-rays. When a sample contains multiple spin states, the spin state dis- 
tribution can be readily and precisely calculated from the line shape variations”. 
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We measured the Kf fluorescence spectra of a series of iron complexes with 
different spin states at beamline 6-2 of the Stanford Synchrotron Radiation Light- 
source (SSRL). All the samples were cooled to 10 K to reduce the influence of X-ray 
damage. The static spectra, collected with a multi-crystal high-resolution X-ray emis- 
sion spectrometer, are shown in Fig. 2a. 

We use the model complex difference spectra generated from molecules that 
have different spin multiplicities in their electronic ground state to model the time- 
dependent populations of electron excited states with different spin multiplicities. 
We verify the validity of using the model complex difference spectra generated from 
the quintet [Fe(phenanthroline),(NCS),] and the singlet [Fe(2,2'-bipyridine)3]** 
model compounds for the quintet excited state by comparing it with the transient 
difference spectra of [Fe(2,2' -bipyridine);]°* after a 1-ps time delay (see Fig. 2d). 
The validity of model complex difference spectra for the '*MLCT and °*T excited 
states proves more challenging to demonstrate because we do not isolate these 
excited states at any time delay in our pump probe measurements (the fit to the 50-fs 
time delay spectra shown in Fig. 2c indicates a population ratio of 5:1.3:1 for the 
18MLCT?T?T, excited states). 

Despite this limitation, the model for the 13MLCT excited generated from doublet 
[Fe(2,2' -bipyridine)3]°* and singlet [Fe(2,2’ -bipyridine)3]”* compounds should be 
robust since the only distinction is the presence of the electron on the 2,2'-bipyridine 
ligand which should have minimal impact on the KB fluorescence spectrum. For the 
3T transient, no long-lived triplet excited state can be used to extract an excited state 
Kf fluorescence difference spectrum as an internal reference. Instead, we use the 
ground state model complex difference spectrum obtained from triplet Fe(11) phtha- 
locyanine (FePc) and singlet [Fe(2,2' -bipyridine)3]°* Kf spectra as our reference 
difference spectra. We used the four-coordinate FePc, rather than an octahedral 
model complex, because octahedral Fe(11) complexes cannot have a triplet ground 
state. While de Beer et al. have shown that tetrahedral, octahedral, and square planar 
molecules in the same quintet or sextet spin state have very similar KB spectra”, this 
cannot be demonstrated experimentally for intermediate spin states. Instead, we use 
theoretical calculations to demonstrate this point. We theoretically calculated the KB 
fluorescence spectra of a four-coordinate square planar and a six-coordinate octa- 
hedral ferrous complex using atomic multiplet theory’. This theory is the standard 
method for calculating and interpreting hard X-ray fluorescence spectra”. For all 
calculations, the Slater-Condon parameters were reduced to 80% of their atomic 
value and the 3d orbital and spin angular momentum (LS) coupling was switched off 
for simplicity. The KB spectra were calculated as a 3p—1s fluorescence following 1s 
ionization. For FePc, we use the previously published crystal field parameters 
(10Dq = 2.7 eV, Ds = 0.86 and Dt = 0.247) in our calculations”. For the six-coord- 
inate octahedral complex calculation, we used a 10Dq = 1.5 eV, consistent with the 
experimental 10Dq ~ 1.5 eV measured for [Fe(2,2’-bipyridine)3]** (ref. 9). This 
value also ensures a low spin (S = 0) ground state, a high spin (S = 2) first excited 
state and an intermediate spin (S = 1) second excited state. 

Extended Data Fig. 1a shows the calculated Kf fluorescence spectra for both the 
four- and six-coordinate complexes. The square planar and octahedral symmetries 
have similar triplet state KB fluorescence spectra, consistent with prior experi- 
mental and theoretical findings for high spin complexes'’**. The accuracy of 
the calculations can also be assessed by comparing calculated and experimental 
difference spectra. In Extended Data Fig. 1b we show a comparison between the 
calculated difference spectrum generated when subtracting an octahedral crystal 
field singlet state from the square planar triplet ground state and the experimental 
difference spectrum generated by subtracting singlet [Fe(2,2’-bipyridine)3]”* spec- 
trum from the triplet FePc spectrum. The calculated difference spectrum reproduces 
the qualitative features of the experimental difference spectrum. The insensitivity of 
the calculated spectra to the coordination geometry and the ability of the calcula- 
tions to reproduce the main features of the experimental difference spectrum val- 
idate the use of the FePc fluorescence spectrum as a model for the triplet excited state 
of [Fe(2,2'-bipyridine)3]”*. 

Using model complex difference spectra has proven more fruitful for the kinetic 
modelling than singular value decomposition (SVD). The model complex difference 
spectra demonstrate that differentiation of the '*MLCT and the *T excited states 
depends upon both the shape of the difference spectra and the relative amplitudes of 
the difference spectra. To first order, the integrated area of the Kf fluorescence spectra 
do not change with spin state. The integral of the absolute value of the difference spec- 
trum, however, depends linearly on the magnitude of the spin change’. This robust 
and reproducible aspect of KB fluorescence spectroscopy makes the relative ampli- 
tudes of the difference spectra an important distinction. SVD, however, struggles to 
differentiate species when a difference in relative amplitude is a key distinguishing 
feature of the difference spectra. For this reason, we have used model complex dif- 
ference spectra, rather than SVD to model the time resolved data. 

Kinetic modelling of the [Fe(2,2’-bipyridine);]°* experimental population dynamics. 
We have used two distinct kinetic models to analyse the time-dependent electron 
dynamics in [Fe(2,2'-bipyridine);]”*. For the direct transition between '*MLCT 
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and °T>, without a °T transient state, the relaxation mechanism can be expressed as 
follows: 


k k 
*8MLCT —> °T, —> 1A; 
where '*MLCT corresponds to the electronic excited state populated by optical 
excitation, °T, corresponds to the quintet ligand field state, and 'A, represents the 
electronic ground state. The differential rate equations for each species are given by 
the following mass balance simultaneous equations, 


d['3MLCT] ia 
4, = — ky PMLLCT]| 
: 
al — =k, [!8MLCT] —ks[°T)] 
dla 
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The integrated rate equations provide the following time-dependent populations 
for the three species, 
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From prior ultrafast measurements, we know that the lifetime of the ST, excited state 
is roughly 660 ps (refs 6, 13, 15). The long lifetime of the ST, excited state allows us to 
set k; ~ 0 when we model the kinetics in the first couple of picoseconds. The inte- 
grated rate equations can be reduced to: 


(!MLCT] = [!3 MLCT] e~*! 
[°T2] = ['*MLCT],(1—e7*"’) 


For the sequential kinetic model with a *T transient state, the relaxation mechanism 
can be expressed as follows: 


13MLCT > 37s St, 8 1A, 


where '°MLCT corresponds to the electronic excited state populated by optical 
excitation, *T corresponds to the triplet ligand field excited state, and °T, corresponds 
to the quintet ligand field excited state, and ‘A, represents the electronic ground state. 
The differential rate equations for each species are given by the following mass balance 
simultaneous equations: 
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The integrated rate equations provide the following time-dependent populations for 
the four species: 
[SMLCT] = [!3MLCT],e~*! 
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The long lifetime of the °T, excited state allows us to set k; ~ 0 when we model the 
kinetics in the first couple of picoseconds. The integrated rate equations can be reduced 
to: 


(!MLCT] = [!3MLCT],e~ 
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To fit the experimental data to a kinetic model, we must convolve the kinetic model 
with the instrument response function which we describe with a Gaussian function. 
Taking the example of [!> MLCT] = [!? MLCT],e~™', which is an exponential decay 
starting at time zero (f9), it will be expressed as 
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[‘?MLCT] = [!?MLCT], i 


where H is the Heaviside step function and g is the temporal width of the instrument 
response function. 

Statistical determination of the correct kinetic model. Given two distinct kinetic 
models, we must determine which model best represents the experimental data. 
Choosing the model with smaller residual sum of squares (RSS) is not sufficient 
because the two models do not have the same number of fit parameters. We have 
used the statistical F-test to determine whether the model with or without a *T 
transient provides the best fit of the experimental data*®. 

The F-test provides a statistically robust method for comparing the quality of two 
models with a different number of fit parameters when the simpler model 1 can be 
‘nested’ within the more complex model 2. Model 1 has p, parameters, and model 2 has 
p2 parameters, where p> > p;. For any choice of parameters in model 1, the model 2 
should always be able to fit the data at least as well as the model 1. Thus, model 2 
typically will have a lower RSS than model 1. The F-test allows us to determine the 
statistical significance of this variance in RSS. The F statistic can be calculated by 


RSS; — RSS, 
p__P2—Pi__ (RSS: —RSS2)(n—ps) 
RSS) RSS2(p2 — pi) 
n—p2 


where nis the number of data points (time delays) fitted by the two models. For the 
null hypothesis that model 2 does not provide a fit statistically superior to that pro- 
vided by model 1, the F will have an F distribution defined by the degrees of freedom, 
(po — p) and (n — pz). To reject the null hypothesis, F must exceed a critical value that 
depends upon the degrees of freedom and the level of confidence”. 
[Fe(2,2’-bipyridine);]’* experimental data modelling. Using the reference differ- 
ence spectra with the kinetic model, we fit the time-dependent difference Kf fluor- 
escence spectra for optically excited [Fe(2,2’-bipyridine) 3]?* in water. The parameters 
extracted from the fit of the two kinetic models can be found in Extended Data Table 1. 
We compute the time constants and uncertainties reported in Extended Data Table 1 
by fitting multiple runs of the same experiment and then calculating the mean and the 
standard deviation. The experimental two-dimensional transient difference spectra, 
fit spectra, residuals, and excited electronic state populations extracted from the best 
fit for each model can be found in Extended Data Figs 2 and 3. Given the very short 
lifetime of the °T excited state, the deviations between the fits of the two models pre- 
dominantly occur within the first 500 fs. The two-dimensional plots of the residuals 
in Extended Data Figs 2c and 3c highlight the regions where the **MLCT—>*T>°T, 
model provides a fit superior to that of the '*MLCT->°T, model. Unsurprisingly, 
this corresponds to time delays with larger *T populations and spectral regions with 
the largest difference between the 5T and °T> spectra (7,053-7,056 eV). 

The residual sum of squares quantifies the variable quality in the fits. The residual 

sum of squares for each model is: RSS, = 3.77 and RSS, = 3.21. In this situation, we 
have p; = 5, p2 = 6 and n = 45. To be 95% confident that the complex model is better 
than the nested model, the calculated F value must be larger than the F distribution 
value that captures 95% of the distribution for F(p2 — pi, n — p2) which is 4.09. The 
calculated F value is 6.71 which exceeded 4.09. So with 95% confidence we conclude 
that the model containing the *T transient provides a better description of the experi- 
mental data. 
Influence of instrument response function parameters on the data analysis. 
We utilize the instrument response function (IRF) as a variable since the technology 
does not yet exist to measure the instrument response time accurately. This leads to 
an increase in the number of parameters in the data analysis. This increase in fit 
parameters makes statistically differentiating the robustness of alternative kinetic 
models more difficult, rather than easier. 

To ensure that the statistical superiority of the kinetic model possessing the 
3T transient does not result from our uncertainty about the instrument response 
function parameters, we have investigated how variation of time zero and FWHM 
values differentially influence the RSS for the direct '*>MLCT->°T, model and the 
*?MLCT->*T->°T> model. For the range of time zero and FWHM values reported 
in Extended Data Table 1 that adequately fit the experimental data with either 
model, the model containing the *T transient always provides a significantly super- 
ior fit to the experimental data. We have used the instrument response function 
values that minimize the RSS for the **MLCT—>°T,; model to fit the data with the 
*8MLCT->*T->°T, model. Using this sub-optimal instrument response function 
only increases the RSS, from 3.21 to 3.27, both significantly less than the direct 
model RSS, = 3.77. Using the definition for F given above and p, = 5, p2 = 6 and 
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n= 45, we calculate F = 5.98, in excess of the 4.09 value needed to conclude with 
95% confidence that the complex model provides a better representation of the 
experimental data than the nested model. 

Experimental time resolution can also influence the ability to identify a distinct 
excited state. For the case of the triplet transient, the temporal resolution of 150 fs has 
little impact on the characterization of the triplet excited state dynamics. To dem- 
onstrate that the roughly 150 fs FWHM IRF does not inhibit our ability to char- 
acterize the triplet population dynamics, we have simulated the '*MLCT->*T->°T, 
population kinetics using the time constants extracted from the best fit to the 
experimental data listed in Extended Data Table 1 with an IRF possessing a 150 fs 
FWHM anda 5 fs FWHM. The initial time dependence of the ’*MLCT state signal 
depends significantly on the time resolution (though the decays for time delays 
longer than 200 fs look similar), but the shape and amplitude of the triplet popu- 
lation is similar. The convolution of the IRF and the lifetime of the '*MLCT excited 
state determine the time dependence of the *T transient state formation observed 
experimentally. The low transient population of the triplet state results primarily 
from the fact that the decay rate of the *T state exceeds that of the '*"MLCT state bya 
factor of two. 
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Extended Data Figure 1 | Experimental and calculated KB fluorescence 
spectra for triplet spin states. a, The calculated KB fluorescence spectra of iron 
complexes: triplet Fe(11) in square planar crystal field (red) (calculation 
parameters based on Fe(11)phthalocyanine), and triplet excited state in an 
octahedral crystal field (blue) (calculation parameters based on [Fe(2,2’- 
bipyridine);] *). b, The experimental Kf fluorescence difference spectrum 


normalized A| 


7040 7045 7050 7055 7060 7065 7070 7075 
emission energy (eV) 


(red) obtained by subtracting the singlet [Fe(2,2’ -bipyridine)3]” = spectrum 
from the triplet Fe(11)phthalocyanine spectrum, and the calculated KB 
fluorescence difference spectrum (blue) generated by subtracting the spectrum 
of the singlet state in an octahedral crystal field from the triplet state in a square 
planar crystal field. 
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Extended Data Figure 2 | Time-dependent Kf fluorescence spectra and fit _c, Residuals for the best fit, with the colour-scale maximum and minimum set to 
using the sequential kinetic model with a triplet transient. a, Experimental —_ one-fifth of the value used in a and b. d, The excited state populations extracted 
transient fluorescent amplitude difference spectra plotted with arbitrary from the best fit. 

units, and b, fit using the sequential kinetic model with a triplet transient. 
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Extended Data Figure 3 | Time-dependent Kf fluorescence spectra and fit _c, Residuals for the best fit with the colour scale maximum and minimum set to 
using the direct kinetic model without a triplet transient. a, Experimental _ one-fifth of the value used ina and b. d, The excited state populations extracted 
transient fluorescent amplitude difference spectra plotted with arbitrary from the best fit. 

units, and b, fit using the direct kinetic model without a triplet transient. 
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Extended Data Figure 4 | The 50 fs time delay normalized KB fluorescent 
amplitude difference spectrum (AJ) and kinetic model fit plotted as a 
function of X-ray emission energy. The measured data (black circles and 
line), along with the best global fit from the sequential kinetic model with a 
transient triplet state (red line). 
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Extended Data Figure 5 | Absorption spectrum and pump power 
dependence measurements. a, The ultraviolet—visible absorption spectrum 
of [Fe(2,2' -bipyridine)3]°* in water. b, Power (fluence) dependence of the 
change in probe transmission measured at 520 nm, following excitation of an 
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aqueous solution of [Fe(2,2’-bipyridine)3]Cl, with a 520 nm pump pulse. 
The figure shows the change in transmission (AT) measured at a 10 ps time 
delay, a time long compared to the spin crossover and vibrational cooling 
timescales, but short compared to the lifetime of the high-spin excited state. 
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Extended Data Table 1 | Fitted model parameters 


Kinetic model lifetime lifetime | Time zero Instrument response 
1/ky(fs) 1/k2(fs) | to(fs) o(fs) FWHM(fs) 

with triplet transient 150450 70+30 0+7 56+8 130+20 

without triplet transient ye ee Ye 170415 


Values shown are extracted from fits to sequential and direct spin crossover models for photo-excited [Fe(2,2'-bipyridine)3]°* in water. We compute the time constants and uncertainties by fitting six runs of the 
same experiment and then calculating the mean and standard deviation. 
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The poleward migration of the location of tropical 
cyclone maximum intensity 


James P. Kossin', Kerry A. Emanuel & Gabriel A. Vecchi® 


Temporally inconsistent and potentially unreliable global historical 
data hinder the detection of trends in tropical cyclone activity’ *. This 
limits our confidence in evaluating proposed linkages between ob- 
served trends in tropical cyclones and in the environment**. Here we 
mitigate this difficulty by focusing ona metric that is comparatively 
insensitive to past data uncertainty, and identify a pronounced pole- 
ward migration in the average latitude at which tropical cyclones have 
achieved their lifetime-maximum intensity over the past 30 years. The 
poleward trends are evident in the global historical data in both the 
Northern and the Southern hemispheres, with rates of 53 and 62 kilo- 
metres per decade, respectively, and are statistically significant. When 
considered together, the trends in each hemisphere depict a global- 
average migration of tropical cyclone activity away from the tropics 
at a rate of about one degree of latitude per decade, which lies within 
the range of estimates of the observed expansion of the tropics over 
the same period’. The global migration remains evident and statis- 
tically significant under a formal data homogenization procedure’, 
and is unlikely to be a data artefact. The migration away from the 
tropics is apparently linked to marked changes in the mean merid- 
ional structure of environmental vertical wind shear and potential 
intensity, and can plausibly be linked to tropical expansion, which is 
thought to have anthropogenic contributions®. 

Inconsistencies in the historical global “best-track’ data can introduce 
substantial uncertainty into global-mean measures of tropical cyclone 
activity. Since the introduction of geostationary weather satellites in the 
mid to late 1970s, measures of tropical cyclone frequency are generally 
considered to be accurate, and there is no observed trend in global fre- 
quency since that time’*. Comparatively, measures of tropical cyclone 
intensity are considered to be highly uncertain in the global data*’. Con- 
sequently, storm duration is also uncertain because identifying the mo- 
ment when a cyclone forms (cyclogenesis) requires accuracy in intensity 
estimates, as the definition of cyclogenesis is entirely dependent on a 
nascent storm’s intensity reaching a formally specified threshold. Similar 
uncertainty exists in identifying a cyclone’s demise (cyclolysis). These 
uncertainties can project onto metrics such as power dissipation’® and 
accumulated cyclone energy", which are amalgamations of frequency, 
duration and intensity. 

But measurements ofa storm’s position taken around the time that it 
reaches its lifetime-maximum intensity (LMI) are much less uncertain. 
By this time in a storm’s evolution, it is more likely to have been detected 
and to be under close observation. Measurements of storm position at 
the time of LMI are also less sensitive to inaccuracy in measurements of 
intensity, as well as to known interregional differences in wind-averaging 
techniques’, because determining the absolute LMI is not critical—it is 
necessary only to know that the intensity has peaked. This also makes 
measurements of storm position at the time of LMI comparatively in- 
sensitive to temporal heterogeneity in the historical best-track intens- 
ity record’. It is this heterogeneity that has presented substantial challenges 
to trend detection in tropical cyclone metrics that require absolute mea- 
sures of intensity”. 


Here we consider the 31-yr period 1982-2012. In this period, the global 
best-track data are considered most complete and at their highest qual- 
ity in each basin’, and storm position is well monitored globally by geo- 
stationary satellites. This period also encompasses a recent satellite-based 
global tropical cyclone intensity reanalysis’, and is the interval over which 
the atmospheric reanalysis products'*“* that provide information on the 
environmental changes that affect tropical cyclones are most reliable. 

When the annual-mean latitude of LMI is calculated from the best- 
track data in the Northern and Southern hemispheres over this period 
(Fig. 1a, b, red lines), there are clear and statistically significant pole- 
ward trends in both hemispheres of 53 and 62 km per decade, respect- 
ively (Table 1). The positive contribution to these hemispheric trends 
from each ocean basin except that of the North Indian Ocean (Table 1 
and Extended Data Fig. 1) suggests that the migration away from the 
tropics is a global phenomenon, although there are large regional dif- 
ferences in the trend amplitudes and their statistical power. These dif- 
ferences are probably due, in part, to regional differences in interannual 
to multidecadal variability’’. The largest contribution to the Northern 
Hemisphere trend is from the western North Pacific Ocean, which is also 
the most active basin in terms of annual tropical cyclone frequency. By 
contrast, the North Indian Ocean has the lowest mean annual frequency, 
and the small equatorward trend there has a much lesser effect on the 
hemispheric trend. The North Atlantic Ocean and eastern North Pacific 
exhibit small poleward trends and also contribute little to the hemi- 
spheric trend. In the Southern Hemisphere, both the South Pacific and 
the South Indian Ocean regions contribute substantially to the poleward 
trend. 

Within the period 1982-2009, the latitude of LMI can be reanalysed 
using a globally homogenized record of intensity (ADT-HURSAT*). When 
this is done, the annual-mean time series exhibit similar variability and 
trends (Fig. la, b, blue lines), although the ADT-HURSAT-based trend 
has a greater amplitude than the best-track-based trend in the North- 
ern Hemisphere and a lesser amplitude than the best-track-based trend 
in the Southern Hemisphere (Table 1), where the trend is no longer sig- 
nificant with 95% confidence. However, when both hemispheres are 
considered together they depict a global migration away from the deep 
tropics, and the best-track and ADT-HURSAT data exhibit similar pole- 
ward trends of 115 and 118 km per decade, respectively (Table 1). In 
this global view, the trends in the best-track and ADT-HURSAT data 
are consistent and both are statistically significant. 

As found with the best-track data, the ADT-HURSAT-based time 
series exhibit large differences in trend amplitudes and statistical power 
when separated by ocean basin (Table 1 and Extended Data Fig. 2). The 
western North Pacific is the largest contributor to the trend in the North- 
ern Hemisphere, and the eastern North Pacific also contributes signif- 
icantly, unlike the best-track data from that region. The equatorward 
trend in the North Indian Ocean best-track data is not found in the ADT- 
HURSAT data, which shows essentially no trend in that region (the lack 
of any poleward trend in the North Indian Ocean might be expected 
given the confines of the basin and the close proximity of land to the 
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Figure 1 | Poleward migration of the latitude of LMI away from the tropics. 
a, b, Time series of annual-mean latitude of tropical cyclone LMI calculated 
from the best-track historical data (red) and the ADT-HURSAT reanalysis 
(blue) in the Northern (a) and Southern (b) hemispheres. c, The annual-mean 
difference between a and b shows the global migration of the latitude of 

LMI away from the tropics. Linear trend lines are shown with their 95% 
two-sided confidence intervals (shaded). Note that the y axis in b 

increases downwards. 


Table 1 | Linear trends, by region, of annual-mean latitude of LMI 


north). In the North Atlantic, the best-track and ADT-HURSAT data 
sets both show essentially no trend. There are similar poleward trends 
in the best-track and ADT-HURSAT data from the South Pacific, but 
the ADT-HURSAT data in the South Indian Ocean exhibits a smaller, 
statistically insignificant trend. 

Although regional differences are evident, the migration of the mean 
latitude of LMI away from the deep tropics is observed in both hemi- 
spheres, which indicates that this is a global phenomenon. The genesis 
and subsequent intensification period of tropical cyclones, which pre- 
cedes LMI and controls when and where LMI occurs, is strongly mod- 
ulated by the environment that the storms move through in this period. 
Known major factors controlling tropical cyclone evolution are the en- 
vironmental vertical wind shear and the potential intensity'**. Potential 
intensity describes the thermodynamically based maximum tropical cy- 
clone intensity that the environment will support, all other factors being 
optimal. Vertical wind shear is one of the key factors that inhibit a storm 
from achieving this maximum. Greater shear and lesser potential intens- 
ity each inhibits genesis and intensification, and vice versa, and increased 
shear in the deep tropics, decreased shear at higher latitudes, or both, 
can thus be plausibly linked to a poleward migration of the latitude of 
LMI. Decreased potential intensity in the deep tropics, increased po- 
tential intensity at higher latitudes, or both, could be expected to result 
ina similar migration. Here we explore these environmental factors using 
three different atmospheric reanalysis products, NCEP/NCAR”, ERA- 
Interim’? and MERRA™. All three products exhibit broad regions of 
increased shear in the deep tropics and decreased shear in the subtropics 
(Fig. 2), which is consistent with the observed changes in the tropical 
cyclones. The changes in mean potential intensity are not as consis- 
tent among the different reanalysis products, particularly in the tropics, 
which is probably a result of spurious differences in upper tropospheric 
temperatures’””°. However, the meridional structure of potential inten- 
sity change is generally consistent in showing greater increases at higher 
latitudes, and the MERRA data, in particular, also show a broad reduc- 
tion of potential intensity in the deep tropics. 

The observed changes in shear and potential intensity provide evid- 
ence that the global migration of tropical cyclones away from the tropics 
is being modulated by systematic environmental changes. Shifts in trop- 
ical cyclone tracks in most regions have also been linked to phase changes 
in El Nifio/Southern Oscillation’ * (ENSO), which can potentially con- 
tribute to the poleward trends in LMI identified here. To test this, we 
decrease the contribution of ENSO by regressing the LMI latitude time 
series onto an index of ENSO variability. When this is done (Fig. 3), the 
amplitude of the interhemispheric migration rates is found to decrease 
only slightly in both the best-track and the ADT-HURSAT data, and 
the statistical power of the trends in fact increases. This makes it un- 
likely that natural ENSO variability has a role in the observed multi- 
decadal poleward migration of LMI, although it plays a substantial part 
in its interannual variability. 

The potential for contributions from natural variability occurring on 
decadal or longer timescales still exists, but quantifying this is difficult 
using relatively short observation records. We propose that there is a link- 
age between the poleward migration of LMI and the observed expan- 
sion of the tropics. The rate of expansion since 1979 varies considerably 
among existing studies®, but the rate of LMI migration identified here 
falls well within this range. This potential linkage between tropical cy- 
clones and the expansion of the tropics further heightens interest in 
establishing the forcing mechanisms of the expansion, which are at pre- 
sent uncertain but are generally thought to have anthropogenic contri- 
butions®. The expansion of the tropics, as measured by the meridional 


NHEM SHEM NATL WPAC 
Best track +53+43 +62+48 +7498 +37+55 
ADT-HURSAT +83+50 +354+44 —-12+126 +105+71 


EPAC NIO SIO SPAC Global 
+10+32 25+78 67+55 51+68 115+70 
+34 +30 +10+106 30+ 52 54+79 118+ 70 


Trends are deduced from the best-track and ADT-HURSAT data sets. The slope (kilometres per decade) and the 95% two-sided confidence bounds are shown. Positive slopes represent poleward migration. NHEM, 
Northern Hemisphere; SHEM, Southern Hemisphere; NATL, North Atlantic; WPAC, western Pacific; EPAC, eastern Pacific; NIO, North Indian Ocean; SIO, South Indian Ocean; SPAC, South Pacific. 
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Figure 2 | Observed changes in the mean environment where tropical 
cyclones form and track. Percentage changes from 1980-1994 to 1995-2010 
in mean vertical wind shear (a, c, e) and potential intensity (b, d, f). Annual 
means are taken over the peak tropical cyclone seasons in each hemisphere 


extent of the tropical Hadley circulation, exhibits a step change in the late 
1990s°. Formal change-point analysis applied to the global time series 
of LMI latitude reveals a significant change point in 1996, providing 
further support for a linkage between the two independently observed 
phenomena. 

Observed changes in vertical wind shear and potential intensity over 
the past 30 yr seem to have resulted in a poleward shift, in both hemi- 
spheres, of the regions most favourable for tropical cyclone development 
(Fig. 2), and an associated migration of tropical cyclone activity away 
from the tropics (Fig. 1). Ifthese environmental changes continue, a con- 
comitant continued poleward migration of the latitude where tropical 
cyclones achieve their LMI would have potentially profound conse- 
quences for life and property. Any related changes in positions where 
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Figure 3 | Global trends of the latitude of LMI with ENSO variability 
reduced. Time series of the latitude of LMI calculated from the best-track 
historical data (red; trend, 99 + 59km per decade) and the ADT-HURSAT 
reanalysis (blue; trend, 107 + 53 km per decade) with ENSO variability 
reduced. The values are calculated from residuals of the regression of LMI 
latitude onto an index of ENSO variability. Shading represents the 95% 
two-sided confidence interval of the trend. 


Meridional distance from Equator (10% km) 


Meridional distance from Equator (10° km) 


(August-October in the north and January-March in the south) from three 
different reanalysis products: MERRA (a, b), ERA-Interim (c, d) and NCEP/ 
NCAR (e, f). 


storms make landfall will have obvious effects on coastal residents and 
infrastructure. Increasing hazard exposure and mortality risk from trop- 
ical cyclones” may be compounded in coastal cities outside the tropics, 
while being offset at lower latitudes. Tropical cyclones also have an im- 
portant role in maintaining regional water resources”, and a poleward 
migration of storm tracks could threaten potable water supplies in some 
regions while increasing flooding events in others. Given these motiv- 
ating factors, further study of the poleward migration of tropical cyclone 
LMI identified here, and its potential link to the observed expansion of 
the tropics, is warranted. 


METHODS SUMMARY 


Best-track data were taken from the International Best Track Archive for Climate 
Stewardship (IBTrACS) v03r05 (ref. 28). Following ref. 3, when a storm has over- 
lapping data from multiple sources, we used the source with the greatest reported 
LMI. The homogenized intensity data were taken from the Advanced Dvorak Tech- 
nique Hurricane Satellite (ADT-HURSAT) data set*. Vertical wind shear and poten- 
tial intensity were calculated over water in the region spanning latitudes 35° S to 
35° N. In the Southern Hemisphere, the longitude was confined to 30°-240° E, which 
excludes the region where storms are not observed to form or track. The wind shear 
is estimated as the magnitude of the vector difference of the respective horizontal 
wind velocities at the 250- and 850-hPa pressure levels. Potential intensity was cal- 
culated following ref. 29. ENSO variability was decreased in the LMI latitude time 
series by regressing the series from the two hemispheres onto the Nifio-3.4 index” 
averaged over the most active periods of tropical cyclone activity (August-October 
in the north and January-March in the south), and analysing the residuals. None of 
the time series studied in this paper exhibited autocorrelation after detrending, as 
determined with the Durbin—Watson test statistic, and no corrections were neces- 
sary when calculating the confidence intervals. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Best-track data were taken from the International Best Track Archive for Climate 
Stewardship (IBTrACS) v03r05 (ref. 28) and are available at http://www.ncdc.noaa. 
gov/ibtracs/. Following ref. 3, when a storm has overlapping data from multiple 
sources, we used the source with the greatest reported LMI. In storms that achieve 
their LMI more than once, the latitude of LMI is taken at the first occurrence. The 
homogenized intensity data were taken from the Advanced Dvorak Technique Hur- 
ricane Satellite (ADT-HURSAT) data set®. The data reflect the additional homo- 
genization procedure addressing the discontinuity in satellite coverage that occurred 
in 1997. The global distribution of the ADT-HURSAT LMI is known to be spur- 
iously leptokurtic’, which is the likely cause of the consistent equatorward bias in 
the mean latitude of LMI when compared to the best-track data, but there is no ex- 
pectation that this bias has any time dependence and it is not expected to affect the 
trends. 

ENSO variability was removed from the LMI latitude time series by regressing the 
individual series from the Northern and Southern hemispheres onto the Nifio-3.4 
index” averaged over the most active periods of tropical cyclone activity (August- 
October in the north and January-March in the south), and analysing the residuals. 
The index is available at http://www.cpc.ncep.noaa.gov/data/indices/. 

None of the time series explored in this paper exhibits autocorrelation after de- 
trending, as determined with the Durbin- Watson test statistic, and no corrections 
were necessary when calculating the confidence intervals. In addition to linear trend 
analysis, the time series were explored for change points with models based on batch 
detection using both the Student t and Mann-Whitney statistics to test for signifi- 
cance at 95% confidence or greater using the ‘cpm’ package”' in the software envi- 
ronment R. 

The global trends in the annual mean latitude of LMI are a result of both intr- 
abasin and interbasin changes. The climatological mean latitude of LMI varies by 
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basin (see, for example, Extended Data Fig. 1) such that, in addition to meridional 
shifts within each basin, changes in the relative annual frequency of storms from 
each basin can also contribute to the global trends in the latitude of LMI. To quan- 
tify this contribution, the LMI latitude of every storm was normalized by the respec- 
tive basin-mean LMI latitude, and the analysis of Fig. 1c was repeated. When this 
was performed, the trend in the best-track data decreased from 115 + 70 to 78 + 
66 km per decade and the trend in the ADT-HURSAT data decreased from 118 + 
70 to 92 + 65 km per decade. Thus, both factors contribute, but the intrabasin pole- 
ward migration of LMI dominates the trends. 

Monthly-mean vertical wind shear and potential intensity were calculated over 
water in the region spanning latitudes 35° S-35° N. In the Southern Hemisphere, the 
longitude was confined to 30°-240° E, which excludes the region where storms are 
not observed to form or track. Vertical wind shear was calculated as*” 


= = 1 a ye aft pal 
shear { (aso Ugso) +(V250 — Vaso)” + U'559 + V'550 

a 1/2 

72 72 a Ea 
+ U'350 + 950 — 2(U'250%s50 + Y/250” 50) \ 

where 1750, Ugso» V250 aNd Vgsq are the zonal (uw) and meridional (v) winds at the 
250- and 850-hPa pressure levels; u' 359, U'gs0> V’ 250 and v’ gs are departures of the 
corresponding daily means from their monthly means; and overbars represent 
monthly-mean quantities. Potential intensity was calculated following ref. 29. In the 
Northern Hemisphere the shear and potential intensity were averaged over August- 
October, and in the Southern Hemisphere they were averaged over January-March. 
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Anthropogenic electromagnetic noise disrupts 
magnetic compass orientation in a migratory bird 


Svenja Engels!*, Nils-Lasse Schneider?*, Nele Lefeldt!?, Christine Maira Hein'?, Manuela Zapkal?, Andreas Michalik’, 


Dana Elbers’?, Achim Kittel’, P. J. Hore* & Henrik Mouritsen’? 


Electromagnetic noise is emitted everywhere humans use electronic 
devices. For decades, it has been hotly debated whether man-made elec- 
tric and magnetic fields affect biological processes, including human 
health’ ©. So far, no putative effect of anthropogenic electromagnetic 
noise at intensities below the guidelines adopted by the World Health 
Organization’ has withstood the test of independent replication 
under truly blinded experimental conditions. No effect has there- 
fore been widely accepted as scientifically proven’ °. Here we show 
that migratory birds are unable to use their magnetic compass in the 
presence of urban electromagnetic noise. When European robins, 
Erithacus rubecula, were exposed to the background electromag- 
netic noise present in unscreened wooden huts at the University of 
Oldenburg campus, they could not orient using their magnetic com- 
pass. Their magnetic orientation capabilities reappeared in electrically 
grounded, aluminium-screened huts, which attenuated electromag- 
netic noise in the frequency range from 50 kHz to 5 MHz by approx- 
imately two orders of magnitude. When the grounding was removed 
or when broadband electromagnetic noise was deliberately generated 
inside the screened and grounded huts, the birds again lost their mag- 
netic orientation capabilities. The disruptive effect of radiofrequency 
electromagnetic fields is not confined to a narrow frequency band 
and birds tested far from sources of electromagnetic noise required 
no screening to orient with their magnetic compass. These fully double- 
blinded tests document a reproducible effect of anthropogenic elec- 
tromagnetic noise on the behaviour of an intact vertebrate. 

For more than 50 years, it has been known that night-migratory song- 
birds can use the Earth’s magnetic field to orient spontaneously in their 
migratory direction when placed in an orientation cage at night in spring 
and autumn’®. This basic experiment has been independently replicated 
many times in various locations’. We were therefore puzzled to find that 
night-migratory songbirds tested between autumn 2004 and autumn 
2006 in wooden huts on the University of Oldenburg campus (53.1507° 
N, 8.1648° E) seemed unable to orient in the appropriate migratory direc- 
tion. Typical data for European robins are shown in Fig. 1a. 

Noting that Ritz et al.'°"' had reported the sensitivity of European 
robins to radiofrequency magnetic fields, in the winter of 2006/2007 we 
decided to reduce the electromagnetic noise in our test huts by screen- 
ing them with electrically connected and grounded aluminium plates 
(Extended Data Fig. 1). The screening left static magnetic fields such as 
the Earth’s completely unaffected, but attenuated the electromagnetic 
noise inside the huts in the frequency range from about 50 kHz to at least 
20 MHz by about two orders of magnitude (Fig. 1c, d and Methods). 
The effect on the birds’ orientation capabilities was profound: with the 
aluminium screens in place, the birds oriented in their normal migratory 
direction the following spring (2007; Fig. 1b) and in subsequent years 
(data in references 12-15). When the horizontal component of the static 
magnetic field was rotated 120° anticlockwise or when the vertical com- 
ponent was inverted, the birds changed their orientation as expected’*"”’. 


These observations suggested that, by chance, we could have discov- 
ered a biological system that is sensitive to man-made electromagnetic 
noise in the range up to 5 MHz with intensities well below the guide- 
lines for human exposure proposed by the International Commission 
on Non-lonizing Radiation Protection (ICNIRP) and adopted by the 
World Health Organization’”. 

Any report of an effect of low-frequency electromagnetic fields on a 
biological system should be subjected to particular scrutiny for at least 
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Figure 1 | Magnetic compass orientation of migratory European robins 
tested at the University of Oldenburg requires aluminium screening. In 
unscreened wooden huts, European robins were disoriented (a, spring 2005, 
n= 21, mean direction 316°, mean vector length r = 0.19, P = 0.48 (Rayleigh 
test)), but after installing grounded aluminium screens, the birds oriented 
highly significantly towards North in spring (b, spring 2007, n = 34, mean 
direction 356° + 20° (95% confidence interval), r= 0.59, P< 0.001). 

c, d, Anthropogenic electromagnetic noise in the huts before (red) and after 
(blue) installation of screens. Traces c and d show the magnetic (B) and electric 
(E) components of the measured electromagnetic fields, respectively, as a 
function of frequency (f). In a, b, each dot indicates the mean orientation of 
all the tests of one individual bird in the given condition. The dots are 
colour-coded as in ¢, d. The arrows show group mean vectors flanked by their 
95% confidence interval limits (solid lines). The dashed circles indicate the 
minimum length of the group mean vector needed for significance according to 
the Rayleigh test (inner circle, P = 0.05; middle, P = 0.01; outer, P = 0.001). 
mN, magnetic North. 
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three reasons. First, such claims in the past have often proved difficult to 
reproduce’ ®. Second, animal studies are commonly used to evaluate human 
health risks and have contributed to guidelines for human exposures’ *. 
Third, “seemingly implausible effects require stronger proof’. 

Therefore, we systematically conducted a large number of double-blind 
experiments over the last 7 years to test whether the restored orientation 
inside the aluminium-screened buildings was really attributable to the 
reduced exposure to anthropogenic electromagnetic noise. To ensure 
that our results were reliable, different generations of students indepen- 
dently replicated several key measurements. We also consulted with lead- 
ing experts to ensure that we very carefully measured the electromagnetic 
fields experienced by the birds in each of the experimental conditions 
described below. Electromagnetic fields have magnetic and electric com- 
ponents, and, especially in the so-called ‘near-field’ (within a few wave- 
lengths of the source), they must be measured separately. 

First, we measured that the aluminium shielding lost its ability to screen 
anthropogenic electromagnetic noise when the grounding was discon- 
nected (Fig. 2e, f). We therefore performed a series of experiments in 
which we tested a group of birds alternately in two different, aluminium- 
screened, wooden huts; one grounded and one left ungrounded. The 
experimenters were unaware which hut was which. The results were 
striking: on the days when the birds were tested in a grounded hut, they 
oriented in their mean northerly migratory direction as expected in 
spring (Fig. 2a, c). By contrast, the same birds were randomly oriented 
on the days when they were tested in an ungrounded hut (Fig. 2b, d). 
Thus, we could control the orientation of the birds inside the huts by 
connecting or disconnecting the grounding of the aluminium screens 
(Fig. 2). 

Second, we assessed whether the electromagnetic noise was directly 
responsible for the disorientation. The birds were tested in the grounded 
aluminium-screened huts in which they normally orient very well (Figs 1b, 
2a, cand data in references 12-15). The birds became disoriented (Fig. 3a) 


Figure 2 | Connecting and disconnecting the grounding of the screens turns 
on and off the birds’ magnetic compass orientation capabilities. When the 
screens were grounded, European robins oriented significantly in their 
migratory direction (a, spring 2008, n = 16, mean direction 341° + 40°, 

r = 0.45, P = 0.04), whereas they were randomly oriented when the grounding 
was disconnected (b, spring 2008, n = 16, mean direction 230°, r = 0.22, 

P= 0.47). In another set of identical tests, this pattern repeated itself 

(c, grounded screens, spring 2008, n = 15, mean direction 348° + 41°, r = 0.48, 
P= 0.03; d, grounding disconnected, spring 2008, n = 14, mean direction 290°, 
r= 0.12, P= 0.82). e, f, Magnetic and electric field intensities, respectively, 

as a function of frequency when the screens were grounded (blue) or 
ungrounded (red). 
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when we introduced broadband electromagnetic noise ranging from 
2 kHz up to ca. 9 MHz (Fig. 3d, e and Extended Data Fig. 2) into the huts 
at magnetic field intensities similar to those measured for the background 
anthropogenic noise (Fig. 1c). To make sure that the observed effect was 
not simply due to the presence of the signal generator and associated 
electronics, we repeated these tests under identical conditions but with 
the output of the signal generator reduced to the lowest possible ampli- 
tude (Fig. 3d, e and Extended Data Fig. 2). In this condition, the birds 
oriented in their migratory direction in spring (Fig. 3b) and reoriented 
appropriately when the static magnetic field was rotated 120° anticlock- 
wise (Fig. 3c). Thus, the disorientation appears to be caused by the elec- 
tromagnetic noise, and not by the mere presence of the electronics. 
Third, we assessed whether the effects are limited to a specific part of 
the radiofrequency spectrum. To answer this question, we tested Euro- 
pean robins inside the grounded, aluminium-screened huts and in the 
presence of deliberately introduced broadband electromagnetic noise 
either in the frequency range from ca. 20 kHz to 450 kHz or from ca. 
600 kHz to 3 MHz (Fig. 4f, g and Extended Data Fig. 2). Asa control, we 
tested the same birds exposed to very-low-amplitude broadband noise 
ranging from ca. 2 kHz to 9 MHz (Figs 3d, e, 4f, g and Extended Data 
Fig. 2) in which we had observed that the birds could orient (Fig. 3b, c). 
As expected, the control birds again oriented appropriately (Fig. 4d, e). 
By contrast, broadband electromagnetic noise in both of the above non- 
overlapping frequency bands prevented the birds from orienting using 
their magnetic compass (Fig. 4a—c). Thus, the effects are not limited to 
one specific frequency or to one part of the radiofrequency spectrum. 
The peak magnetic field intensity of the anthropogenic electromag- 
netic noise at any single frequency measured on typical days around the 
University of Oldenburg is on the order of 0.1-50 nT. The total time- 
dependent magnetic field, summed over the frequency range 10 kHz- 
5 MHz, is much stronger (on the order of at most 200-1,100 nT, see 
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Figure 3 | Artificially produced broadband electromagnetic noise disrupts 
the magnetic compass orientation of birds tested inside the grounded 
aluminium-screened huts. Broadband, noise-modulated, electromagnetic 
fields between 2 kHz and 5 MHz (red traces in d, e and Extended Data Fig. 2) 
added inside the grounded screens resulted in disorientation of the birds 

(a, autumn 2010, n = 22, mean direction 278°, r = 0.07, P = 0.91). When the 
same equipment sent out the weakest possible broadband electromagnetic 
noise (blue traces in d, e and Extended Data Fig. 2), the birds oriented 
significantly towards North (b, spring 2011, n = 30, mean direction 354° + 38°, 
r = 0.39, P = 0.009) and turned their orientation appropriately when the static 
magnetic field was rotated —120° (¢, spring 2011, mN at 240°, n = 27, mean 
direction 253° + 38°, r = 0.41, P = 0.008). d, Magnetic field intensity. e, Electric 
field intensity. 
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Figure 4 | The disruptive effect of broadband electromagnetic noise on 
magnetic compass orientation is not limited to a single narrow frequency 
range. Addition of broadband, noise-modulated, electromagnetic fields 
between ca. 20 kHz and 450 kHz (green traces in f, g) inside the grounded 
screens resulted in disorientation of the birds in the normal field (a, autumn 
2011, n = 31, mean direction 306°, r = 0.24, P = 0.17) and in a field turned 
—120° horizontally (b, autumn 2011, n = 27, mean direction 235°, r = 0.03, 
P= 0.96). Broadband fields between ca. 600 kHz and 3 MHz (black traces in 
f, g) also disoriented the birds (c, autumn 2011, n = 30, mean direction 108°, 
r= 0.11, P= 0.70). When the same equipment sent out the weakest possible 
broadband electromagnetic noise (blue traces in f, g), the birds showed 
appropriately directed magnetic compass orientation (d, autumn 2011, n = 27, 
mean direction 258° + 37°, r= 0.42, P = 0.008), and responded to a — 120° 
horizontal rotation of the static field (e, autumn 2011, n = 26, mean direction 
107° + 32°, r= 0.51, P< 0.001). For comparison, the red traces in f, g show 
the intensity of the strong 2 kKHz-9 MHz broadband noise used for the 
experiments presented in Fig. 3. f, Magnetic field intensity. g, Electric 

field intensity. 


Extended Data Table 1), but still much weaker than the Earth’s mag- 
netic field (ca. 49,000 nT in Oldenburg). Ritz et al.‘ reported that the 
magnetic orientation capabilities of European robins in Frankfurt were 
disabled by highly directional, monochromatic radiofrequency fields with 
magnetic field intensities of 15 nT or more, but not at 5 nT under other- 
wise identical conditions. Their birds were only disoriented at magnetic 
intensities below ca. 100 nT when the radiofrequency matched the electron 
Larmor frequency (1.315 MHzin Frankfurt; 1.363 MHz in Oldenburg), 
that is, the resonance frequency of the spin of a free electron interacting 
with the Earth’s magnetic field. Electromagnetic fields similar to those 
used by Ritz et al.'°"' never occur in natural or urban environments. The 
anthropogenic electromagnetic noise birds and other living beings expe- 
rience is not monochromatic, nor is it spatially or temporally coherent 
(Fig. 1c, d). It has rapidly varying phases and directions and many dif- 
ferent frequencies are present simultaneously. The electromagnetic noise 
we investigated is therefore fundamentally different from the conditions 
used previously'’. Furthermore, our birds were never exposed to mag- 
netic fields stronger than 1 nT at 1.315 MHz or 1.363 MHz (Figs 1-5), 
and two non-overlapping frequency ranges interfere with the birds’ abil- 
ity to use their magnetic compass (Fig. 4). Thus, the disruptive effect on 
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Figure 5 | In a rural location, European robins show magnetic compass 
orientation without screening. a, Orientation at the University campus 
(same data as in Fig. 1a). b, Orientation at a rural location (spring 2011, n = 28, 
mean direction 342° + 32°, r = 0.47, P = 0.002) where the anthropogenic 
electromagnetic noise was much weaker (blue traces in c, d) than at the 
University (red traces in ¢, d). c, Magnetic field intensity. d, Electric 
field intensity. 


orientation is not limited to a specific resonance frequency. It is caused 
by electromagnetic fields covering a much broader frequency range and 
at a much lower intensity (ca. 1 nT at any single frequency) than sug- 
gested previously’*”’. Most importantly, broadband anthropogenic elec- 
tromagnetic noise omnipresent in industrialized environments can lead 
to disorientation. These results have several important implications. 

First, the present results could have significant consequences for migra- 
tory bird conservation. Magnetic compass information is sensed by night- 
migratory songbirds on the ground and in free flight'”"*, which mostly 
takes place at altitudes below 1,000 m (ref. 19). So, ifanthropogenic elec- 
tromagnetic fields prevent migratory songbirds from using their mag- 
netic compass, their chances of surviving the migratory journey might 
be significantly reduced, in particular during periods of overcast weather 
when sun and star compass information is unavailable. Night-migratory 
songbird populations are declining rapidly”®, and anthropogenic elec- 
tromagnetic noise could be a previously neglected contributory factor. 
Nevertheless, billions of migratory birds do find their way every year. 
It is therefore pertinent to ask, how localized is the disorienting effect 
of man-made electromagnetic noise? 

We therefore compared the orientation of our robins in the unscreened 
huts at the University site (Figs 1a and 5a) with their orientation in an 
unscreened wooden shelter located ca. 7.5 km from the University and 
ca. 1 km outside the Oldenburg city limit, where the anthropogenic elec- 
tromagnetic noise was much weaker (Fig. 5c, d) and similar in intensity 
to the electromagnetic noise remaining inside the grounded aluminium- 
screened huts (Fig. 1c, d, blue trace). In the rural setting, the birds could 
orient using their magnetic compass in the absence of screening (Fig. 5b). 
Thus, the disruptive effect of anthropogenic electromagnetic noise on 
the birds’ orientation capabilities appears to be restricted to urban loca- 
tions where there is typically a high usage of electronic devices. There- 
fore, the effect on wild birds is probably also quite localized. 

Second, the results presented here are likely to provide key insights 
into the mechanism either of the magnetic compass sense**”? or of some 
important process that interferes with the birds’ orientation behaviour. 
The biophysical mechanism that would allow such extraordinarily weak, 


15 MAY 2014 | VOL 509 | NATURE | 355 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


broadband electromagnetic noise to affect a biological system is far 
from clear. The energies involved are tiny compared to the thermal 
energy, kpT, but the effects might be explained if hyperfine interactions 
in light-induced radical pairs'**!~” or large clusters of iron-containing 
particles are involved”*”’. It would be truly remarkable if electromag- 
netic noise at the intensities studied here could be shown to disrupt the 
operation ofa radical pair sensor by modifying its quantum spin dynamics. 
To be sensitive to such exceedingly weak magnetic fields, the electron 
spin-decoherence would have to be orders of magnitude slower than is 
currently thought possible. This intriguing prospect has attracted the 
attention of quantum physicists eager to learn lessons from nature that 
might ultimately allow more efficient quantum computers to be designed 
and constructed*’. Furthermore, we cannot rule out that the birds might 
be affected by the electric component of the electromagnetic noise, a 
possibility that has not been considered previously. 

Last, but not least, using a double-blinded protocol we have docu- 
mented a clear and reproducible effect on a biological system of anthro- 
pogenic electromagnetic fields much weaker than the current ICNIRP 
guidelines’: the reference levels for general public exposure to time- 
varying magnetic fields in the relevant frequency band are 6,000 nT at 
150 kHz decreasing to 180 nT at 5 MHz (refs 1, 2). The disruptive effects 
we observe cannot be attributed to power lines (16.7 Hz or 50 Hz fields) 
or to mobile phone signals (GHz frequencies) or to any other fields with 
frequencies below 2 kHz or above 5 MHz because outside this range the 
electromagnetic noise was of similar intensity in all conditions (Fig. 4 
and Extended Data Fig. 2). Electromagnetic noise in the frequency-band 
2. kHz-5 MHz originates primarily from AM radio signals and from elec- 
tronic equipment running in University buildings, businesses and pri- 
vate houses. The effects of these weak electromagnetic fields generated 
by everyday human activity, however, are striking: they disrupt the func- 
tion of an entire sensory system in a higher vertebrate. 


METHODS SUMMARY 


Essential methodological information needed for a basic understanding of the text 
has been woven into the main text at the appropriate places. The Methods section 
contains detailed information on the test subjects, electromagnetic shielding, exe- 
cution and analysis of behavioural experiments, production and measurement of 
static fields, generation of electromagnetic noise, measurement of time-dependent 
electromagnetic fields, and blinding procedures. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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Test subjects. In our study, we tested European robins caught on the campus of 
the University of Oldenburg, Germany. The birds were housed indoors in indivi- 
dual cages in a windowless room under a light regime simulating the local photo- 
period. The tests were performed on the campus of the University of Oldenburg 
during the spring migratory seasons in 2005 (when we tested 22 birds), 2008 (18 
birds) and 2011 (30 birds) and during the autumn migratory seasons in 2010 (24 
birds) and 2011 (42 birds). The number of birds caught during the previous migra- 
tory seasons and the experimental facilities available for the specific experiment in 
the given season determined the choice of sample sizes. In addition to these exper- 
iments, which were performed specifically for the present study, tests were also con- 
ducted by various groups of students in spring 2007”, spring 2008”, autumn 2008” 
(tests with garden warblers, Sylvia borin), spring 2009”, autumn 2009'*"*, autumn 
2010"°, and spring 2011'°. These additional experiments, done primarily for other 
studies that have already been published'*""’, included tests with control groups 
which repeatedly confirmed and extended the results presented in Fig. 1, namely 
that: (a) night-migratory songbirds orient properly using their magnetic compass 
in the grounded and screened huts in the unchanged geomagnetic field'*"’*, and 
(b) they adjust their orientation appropriately when the horizontal component of the 
static field is rotated by — 120° (refs 12-15). Furthermore, in two previous studies'*’° 
we tested groups of European robins in the screened and grounded huts while 
exposing them to a static field the vertical component of which had been inverted, 
leaving the horizontal component still pointing to the North. In this field, the 
polarity of the field lines is unchanged and still points towards magnetic North, 
but the axis of the static field lines is the same as if the static field had been turned 
180° horizontally. Since these robins flipped their orientation ca. 180° (refs 12,15), 
the birds in the grounded and screened huts were using their standard magnetic 
inclination compass*”. All animal procedures were approved by the Animal Care 
and Use Committees of the Niedersichsisches Landesamt fiir Verbraucherschutz 
und Lebensmittelsicherheit (LAVES, Oldenburg, Germany). 

Static magnetic fields. Static magnetic fields were produced with double-wrapped, 
three-dimensional Merritt four-coil systems” with an average coil dimension of 2 m. 
All experiments were performed within the central space of the coils where the mag- 
netic field homogeneity was better than 99%. Before the beginning of each experi- 
ment, the ambient geomagnetic field was measured using a Flux-Gate magnetometer 
(FVM-400, Meda Inc.) in the centre and at the edges of the experimental volume 
within which the orientation funnels were placed. Birds were tested in two different 
static magnetic field conditions: in a magnetic field closely similar to the natural 
geomagnetic field in Oldenburg (normal magnetic field, NMF) and in a magnetic 
field of the same strength and inclination as the local geomagnetic field but rotated 
120° anticlockwise in the horizontal plane (changed magnetic field, CMF). To pro- 
duce the CMF condition, the appropriate currents ran through the two subsets of 
windings per axis of the three-axial, four-coil Merritt system in the same direction. 
In the NMF condition, the currents that were needed to produce the CMF con- 
dition ran through the two subsets of windings but in opposite directions so that 
no significant changes (that is < 10 nT) to the geomagnetic field were produced by 
the coils*’. The actual fields experienced by the birds under the two magnetic field 
conditions were as follows (mean + standard deviation): NMEF condition, 48,900 nT 
+ 150 nT; inclination, 67.7° + 0.6°; horizontal direction, 360° + 0.1°. CME condi- 
tion: 49,000 nT + 470 nT; inclination, 68.0° = 1.1°; horizontal direction, — 120° + 0.5°. 
Electromagnetic shielding of experimental huts. Most of the experiments were 
performed inside wooden huts (Extended Data Fig. 1a) placed at the Wechloy (Natural 
Sciences) Campus of the University of Oldenburg (Extended Data Fig. 1d) in the 
city of Oldenburg (population ca. 160,000; Extended Data Fig. 1c). Some of the ori- 
entation experiments in spring 2011 took place in an unscreened wooden shelter, 
normally occupied by horses, located in fields ca. 7.5 km from the University and 
ca. 1 km outside the built-up part of the city of Oldenburg (Extended Data Fig. 1c). 
An earth barrier in the form of a highway bridge foundation was located between 
the testing location and the city of Oldenburg. 

To attenuate time-dependent electromagnetic fields inside the wooden huts, the 
four walls (including the door) and the roof were covered with 1-mm-thick alumi- 
nium sheets, overlapping by at least 20 mm and bolted together with self-cutting 
screws every 5-10 cm (Extended Data Fig. 1b). We also tested whether the efficiency 
of the screens could be improved by adding aluminium sheets on the floor. No 
improvement was found, probably because negligible electromagnetic noise comes 
from below. Most of the experiments were therefore performed in huts screened on 
five sides in which the air-circulation was improved and the humidity variability 
reduced compared to shielding on six sides. 

The aluminium walls of this five-sided Faraday cage were interconnected at all 
times. In the grounded conditions, this aluminium screening assembly was elec- 
trically connected at a single location to a single grounding rod with a depth of 8 m. 
In the ungrounded conditions, the grounding rod was manually disconnected from 
the aluminium screening assembly. Disconnection of the grounding removed the 
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screening effect of the aluminium shields. In fact, the ungrounded aluminium 
screens acted as an antenna that slightly increased the magnetic field intensity at 
some frequencies inside the test chambers compared to the unscreened condition 
(compare Fig. 1c, d with Fig. 2e, f). The disconnection of the grounding during the 
critical grounding/ungrounding experiments (Fig. 2) was performed by a member 
of the laboratory who was not involved in the behavioural experiments, and the 
persons performing and evaluating the experimental results were not aware of the 
change in conditions until after the completion of the experiments. 

All electronic devices were placed outside this cage, disconnected from their pro- 

tected earths and grounded via the same grounding rod as the Faraday cage. This is 
necessary because the protected earth from the standard power outlet would act as 
an antenna and introduce electromagnetic noise into the system. When properly 
grounded, the shielding attenuated the time-dependent magnetic fields with fre- 
quencies up to at least 20 MHz by approximately two orders of magnitude inside 
the testing chambers. The screening efficiency was estimated by generating elec- 
tromagnetic noise just outside the chambers while measuring the electromagnetic 
noise arriving within. The anthropogenic electromagnetic noise observed at the 
University of Oldenburg is dominated by frequencies below 5 MHz. Higher fre- 
quency contributions were mostly at or below the detection limit of our equipment 
and are therefore not shown in Figs 1-5. 
Generation of electromagnetic noise. To produce electromagnetic noise, a pass- 
ive loop antenna (ETS Lindgren, Model 6511, 20 Hz-5 MHz) was placed vertically 
under the centre of the central orientation funnel and aligned along the North- 
South axis (48 cm vertically from the centre of the loop to the central funnel). 

Broadband electromagnetic noise in the range 2 kHz-9 MHz was produced bya 
signal generator (Hewlett Packard, 33120A, 15 MHz Arbitrary Waveform Gener- 
ator) connected to the antenna using either the maximum output (10 V peak-to-peak 
(pp)) for the strong noise condition or the minimum output (50 mV pp) for the 
weak noise condition used as a control (the generated noise with the output set to 
50 mV pp was weaker than the measurement limit except for the electric compo- 
nent below 500 kHz, see blue traces in Fig. 3d, e and Extended Data Fig. 2). An 
alternative to this control would have been to use the ‘silent shorting’ design sug- 
gested by Kirschvink et al.*. We experimented with this method, but even the shorted 
condition led to measurably increased electromagnetic fields inside the huts, which 
is why we chose the control described above. 

The band-pass electromagnetic noise (20 kHz-450 kHz and 600 kHz-3 MHz) was 
produced using a vector signal generator (Rohde & Schwarz, SMBV 100A, 9 kHz- 
3.2 GHz) connected to the same passive loop antenna. 

Measurements of time-dependent electromagnetic fields. The magnetic and 
electric components of the time-dependent electromagnetic fields were measured 
separately with different antennas connected toa signal analyser (Rohde & Schwarz, 
FSV 3 Signal and Spectrum Analyzer 10 Hz-3.6 GHz). All such measurements were 
performed at a similar time of day as the behavioural experiments, but not while the 
actual tests were running. This procedure was chosen because we wanted to exclude 
any possibility that the measurements or measuring equipment could influence in 
any way the electromagnetic noise fields present while the birds were being tested. 

The magnetic component between 10 kHzand 5 MHz was measured with a cali- 
brated passive loop antenna (ETS Lindgren, Model 6511, 20 Hz-5 MHz). The elec- 
tric component between 10 kHz and 10 MHz was measured with a calibrated active 
biconical antenna (Schwarzbeck Mess-Electronik, EFS 9218, 9kHz-300 MHz). 
The signal analyser was set to ‘max hold’ and a resolution bandwidth of 10 kHz. 
For each condition we measured the fields for a period of 40 min. The traces shown 
in Figs 1-5 are based on 5,000 measurement points between 10 kHz and 5 MHz. 

For the low-frequency range (5 Hz-32 kHz), we used the EFA-300 system (Narda 
Safety Solutions). The magnetic component was measured using the calibrated EFA 
Magnetic Field Probe 100 cm? (EFA-300 system, Narda Safety Solutions). The elec- 
tric component was measured with the calibrated Narda Electric Field Unit (EFA- 
300 system, Narda Safety Solutions). For each measurement, the antennas were 
connected to the EFA-300 hand-held signal analyser, and this signal analyser was 
also set to ‘max hold’ and the fields were measured for a period of 40 min (Extended 
Data Fig. 2). 

It must be stressed that anthropogenic electromagnetic noise fields are always 
present but highly variable in their amplitude, phase and frequency spectrum. Two 
measurements of their intensity and frequency composition will never be identical. 
Consequently, the measurements shown in Figs Ic, d, 2e, f and 5c, d are repres- 
entative examples of the noise measured at the approximate time of day when the 
experiments were performed. 

The maximal total magnetic field intensity (more precisely the magnetic flux 
density, B) in the frequency range between 10 kHz and 5 MHz was calculated using 
the following equation: 


B= 5p Do Bhd) 
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in which B(Af) denotes the total magnetic flux density in the bandwidth of interest, 
Af = 5 MHz-10 kHz = 4,990 kHz, and B;(f;, Afo) is the magnetic flux density at the 
N different frequency values f; (every 1 kHz between 10 kHz and 5 MHz, that is, 
N = 4,990) fora resolution bandwidth Af, which equals 10 kHz here. Expressed in 
words, B(Af ) = (the sum of the magnetic field intensity values/no. of values) X 
(frequency range size/resolution bandwidth), in our case: (the sum of the magnetic 
field intensity values/4,990) x (4,990 kHz/10 kHz) for the total frequency range from 
10 kHz to 5,000 kHz. Extended Data Table 1 lists these values for the different con- 
ditions tested. 

Behavioural experiments. All birds were tested in so-called Emlen funnels* lined 
with scratch-sensitive paper’’, inside wooden huts (4m X 4m X ca. 3 m, Extended 
Data Fig. 1a), where no directional cue other than the geomagnetic field was avail- 
able. In 2005, the experiments took place in these simple wooden huts. From 2007 
onwards, the walls and ceilings of the huts were lined with aluminium shields as 
described above. All electronic equipment was placed outside the hut in a separate 
wooden annex inside an aluminium box and grounded to minimize the generation 
of electromagnetic noise by the equipment itself. 

One hour (+ 10 min) before the experiments started (half an hour before until 
halfan hour after sunset), the birds were placed outdoors in wooden transport cages 
that allowed them to see parts of the evening sky. This gave the birds the possibility 
to calibrate their magnetic compass from twilight cues'”**. Immediately thereafter, 
the birds were placed in modified aluminium Emlen funnels (35 cm diameter, 15 cm 
high, walls 45° inclined*’), which were coated with thermal paper®’ on which the 
birds left scratches as they moved. The top of each funnel was covered with a trans- 
lucent Plexiglas lid that prevented the birds from seeing any landmarks in the hut. 
The overlap point of the paper was adjusted to one of the cardinal directions (N, S, 
E or W). This overlap point was changed randomly between huts and nights. This 
is important because the papers are always evaluated relative to the overlap point 
by researchers who do not know in which direction it was positioned. Even if some- 
one would intentionally try to ignore the condition-blinding protocols (this is highly 
unlikely), this procedure adds a second level of blinding, and it becomes impossible 
for ‘wishful thinking’ to influence the results in any way, since the persons evalu- 
ating the papers cannot know which geographical direction is equivalent to a given 
direction on the paper. The location of the overlap point is only revealed and taken 
into consideration after the primary evaluation of the papers has taken place (for 
procedures see below). 

The birds were tested for 1h under dim white light conditions (2.1 mW m 7”) 
produced by incandescent bulbs (spectrum given in ref. 12). In each hut, nine birds 
were tested simultaneously. The birds were placed in a randomized funnel position 
each night and were put into the funnels from different directions, and we observed 
no systematic differences between the nine funnel positions or between the four huts. 
A second, and in a few instances a third, round of tests on a given night started 1.5 h 
(+ 10 min) after the first or second round. In most cases, each bird was tested in a 
different hut in each round but under the same magnetic field condition (NMF or 
CME) and if applicable under the same time-dependent electromagnetic noise con- 
dition. The results of the different tests can therefore be treated as independent. 
The mean direction of each bird in each condition was calculated by unit-vector 
addition of the individual mean directions from the typically 3-15 tests per bird 
per condition in which the bird was judged to be oriented. 

If more than one condition was tested in a given season, the same experimental 
birds were tested in all conditions. The experimental condition experienced by a 
given bird was mostly interchanged every second day, and whenever possible, dif- 
ferent conditions were run simultaneously in different huts, so that any putative daily 
variation, for instance induced by the weather*’, would be averaged out amongst 
the experimental groups. 

In spring 2008, we decided to test the effect of the grounding of the shielding and 
performed experiments in two different huts. One of them was grounded (g) and 


the other was left ungrounded (u) without the experimenters knowing which one 
was which. The experimental condition for each bird alternated every other day; 
half the birds were tested in g-u-g-u conditions while the other half were u-g-u-g as 
follows: group 1, grounded on days 1 and 2, ungrounded on days 3 and 4, grounded 
on days 5 and 6, and ungrounded on days 7 and 8; group 2, ungrounded on days 1 
and 2, grounded on days 3 and 4, ungrounded on days 5 and 6, and grounded on 
days 7 and 8. The data from these measurements are presented in Fig. 2 as follows: 
Fig. 2a: group 1: days 1, 2 and group 2: days 3, 4. Figure 2b: group 2: days 1, 2 and 
group 1: days 3, 4. Figure 2c: group 1: days 5, 6 and group 2: days 7, 8. Figure 2d: 
group 2: days 5, 6 and group 1: days 7, 8. 

In 2010 and 2011, we performed experiments in which we added broadband elec- 
tromagnetic noise (for details see above). The direction of the static magnetic field 
and electromagnetic noise conditions in a given hut were changed regularly; usually 
different conditions were tested concurrently in different huts on any given night. 

At the rural location, 12 European robins were tested simultaneously in a wooden 
shelter located in a field (Extended Data Fig. 1c). Here, the birds were tested under 
natural magnetic conditions without a magnetic coil system. Other testing proce- 
dures were the same as in the huts on the University campus. 

Before we started the experiments in any migratory season, we tested the birds 
in NMF and CME conditions with no experimental manipulation for several nights 
to ensure that they were in migratory mood and to get a control direction. 
Orientation data analysis. Independently, two researchers visually determined 
each bird’s mean direction to the nearest 10° from the distribution of the scratches 
without knowing the direction of the overlap-point of the paper or the magnetic 
field conditions experienced by the bird. If one of the two researchers considered 
the scratches to be randomly distributed and the other did not, or if the two inde- 
pendently determined mean directions deviated by more than 30°, a third indepen- 
dent researcher was asked to determine the mean direction. If this third individual 
determined a mean direction similar to one of the first two, and if the individual 
with initially differing opinion also agreed with this direction, the mean of the two 
similar directions was recorded as the orientation result. If the three independent 
researchers could not agree on a mean direction, the bird’s heading was defined as 
random and excluded from the analyses (7% of all tests). Birds with fewer than the 
pre-established lower limit of 100 scratches on the paper were considered inactive’ 
and were also excluded from the analysis (40% of all tests). The observers performed 
this screening before they knew the direction of the overlap-point (see above). In 
this way we can be certain that the person making the decision on whether the bird 
left more or less than 100 scratches was not influenced by the bird’s directional 
preferences. The average mean heading for each bird was calculated from all its 
oriented tests recorded under a given experimental condition. On the basis of these 
individual mean directions, group mean vectors were calculated by summing unit 
vectors in the mean directions of each individual bird and dividing by the total 
number of birds tested. The significance of the group mean vector was tested using 
the Rayleigh test**. 
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Extended Data Figure 1 | Wooden huts and experimental locations. 

a, Photograph of one of the four identical wooden huts used for our 
experiments. b, Photograph from the inside of an experimental hut showing the 
aluminium screening, parts of the Merritt coil system, and the table on which 
the funnels were placed. The insert shows the self-cutting screws used to 
connect the aluminium plates. c, Simple map of the city of Oldenburg. Built-up 
areas are shown in grey and nature-protected areas in green. Black lines denote 
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highways, blue denotes water. Red stars: ‘1’ indicates the location of the 
University campus and ‘2’ the rural location used for some of the tests. 

d, Map of the University of Oldenburg Wechloy Campus. 1, main University 
building housing the biology, chemistry, physics and mathematics institutes; 
2, botanical greenhouse; 3, iron-free wooden building; 4, the locations of the 
four wooden huts used for our experiments; 5, “Next Energy’ building. 
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Extended Data Figure 2 | Electromagnetic noise measurements in the range 
from 40 Hz to 32 kHz. a, Magnetic field intensity (B). b, Electric field intensity 
(E). The colour coding of the traces corresponds to Fig. 4. Notice that the 
frequency-axis (f) is logarithmic. 
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Extended Data Table 1 | The accumulated time-dependent magnetic field intensity summed over all the frequencies in the spectra recorded 
for each behavioural test condition 


Oldenburg Oldenburg Oldenburg Rural 10kHz-5 10kHz-5 20-450 600 kHz-3 
unshielded grounded ungrounded location MHz MHz weak kHz MHz 
shielding shielding strong bandpass bandpass 


Fig. 1 and5 Fig. 1 and2 Fig. 2 red Fig. 5 Fig. 3 Fig. 3 and 4 Fig. 4 Fig. 4 black 
red trace blue trace trace bluetrace and4red_ bluetrace green trace 
trace trace 


f (kHz) Accumulated field intensity (nT) 


1008.10 : 827.86 2.60 278.88 3.31 133.09 34.83 


714.17 : 705.88 1.74 229.56 2.35 101.11 33.98 


855.63 : 125.59 0.31 119.30 0.69 128.27 0.71 


81.89 : 561.05 0.79 98.81 1.00 1.16 32.48 
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Dynamics and associations of microbial community 
types across the human body 


Tao Ding’ & Patrick D. Schloss! 


A primary goal of the Human Microbiome Project (HMP) was to 
provide a reference collection of 16S ribosomal RNA gene sequences 
collected from sites across the human body that would allow micro- 
biologists to better associate changes in the microbiome with changes 
in health’. The HMP Consortium has reported the structure and 
function of the human microbiome in 300 healthy adults at 18 body 
sites from a single time point”*. Using additional data collected over 
the course of 12-18 months, we used Dirichlet multinomial mixture 
models* to partition the data into community types for each body 
site and made three important observations. First, there were strong 
associations between whether individuals had been breastfed as an 
infant, their gender, and their level of education with their commu- 
nity types at several body sites. Second, although the specific taxo- 
nomic compositions of the oral and gut microbiomes were different, 
the community types observed at these sites were predictive of each 
other. Finally, over the course of the sampling period, the community 
types from sites within the oral cavity were the least stable, whereas 
those in the vagina and gut were the most stable. Our results dem- 
onstrate that even with the considerable intra- and interpersonal 
variation in the human microbiome, this variation can be parti- 
tioned into community types that are predictive of each other and 
are probably the result of life-history characteristics. Understanding 
the diversity of community types and the mechanisms that result in 
an individual having a particular type or changing types, will allow 
us to use their community types to assess disease risk and to person- 
alize therapies. 

Building on previous analysis of a healthy cohort of 300 individuals, we 
analysed a 16S rRNA gene sequence data set from the HMP Consortium”». 
The final data release for this cohort provided 16S rRNA gene sequence 
data and clinical metadata (Extended Data Table 1) from two time points 
for each of 300 healthy individuals and from a third time point for 100 of 
the individuals at 15 body sites for men and 18 for women’; the interval 
between samplings varied between 30 and 451 days (median = 224 days). 
A significant difficulty in analysing microbiome data has been the con- 
siderable intra- and interpersonal variation in the composition of the 
human microbiome*®’. A recently proposed approach for overcom- 
ing this difficulty within the gastrointestinal tract has been the concept 
of enterotypes, or more generically, stool community types**”. In this 
approach samples are clustered into bins based on their taxonomic 
similarity. Specific enterotypes have been associated with the amount 
of protein, fat and carbohydrates in one’s diet, obesity, inflammatory 
bowel disease, and Crohn’s disease*?-!". Others have found associations 
between specific vaginal community types and the sexually transmitted 
Trichomonas vaginalis, pH, and ethnicity’?-“* and associations between 
skin community types and psoriasis’*. Using bacterial community struc- 
tures collected from 18 body sites and up to three time points, we applied 
community typing analysis to understand better the factors that affect 
the structure of the microbiome and contribute to human health. 

Concern has been expressed regarding whether community types 
reflect partitioning of an abundance gradient or the presence of clusters 
of relative abundance profiles*"®. Two general approaches have been 
developed to assign samples to community types: partitioning around 


the medoid (PAM) and Dirichlet multinomial mixture (DMM) models**. 
To compare these methods we first generated simulated communities 
where there were one or four community types. Analysis of the sim- 
ulated communities indicated that the negative log model evidence 
metric used by the DMM-based approach was superior to the metrics 
used to assess clusters within the PAM-based approach (Supplemen- 
tary Information). Next, we assigned the samples for each body site to 
community types using both methods. Calculation of the negative log 
model evidence demonstrated that the community types identified using 
DMM were superior to those identified using the PAM-based approach 
(Extended Data Table 2 and Extended Data Fig. 1). Thus, our analysis 
of simulated data and the HMP data suggests that the community types 
represent clusters of relative abundance profiles. 

Using the DMM-based approach, we identified between two (anterior 
nares) and seven (tongue dorsum) community types per body site (see 
Source Data associated with Fig. 1 for community data and DMM fits). 
As an example, bacteria from stool samples fell into four distinct com- 
munity types (Fig. 1a). We observed that 63 genera were needed to account 
for 90% of the difference between a model with a single community 
type and four community types (see Source Data associated with Fig. 1). 
Thus, it was not merely the most abundant bacterial population that 
differentiated the types as has been previously reported (for example, 
Bacteroides, Prevotella, or Ruminococcus)**°”’; rather, community types 
were identified based on complex configurations of numerous taxa. 
In fact, this supports the findings of the original study; that is, the taxa 
that typify each enterotype represent networks of co-occurring bac- 
terial populations’. Inspection of the five most important genera, which 
accounted for 54% of the difference in fit between four community types 
and one, indicated that each community type represented a cluster of 
relative abundance profiles (Fig. 1b). Community type A had the highest 
levels of Bacteroides but lacked Prevotella and Ruminococcaceae. Similar 
to community type A, community type C also lacked Prevotella, but 
had a lower relative abundance of Bacteroides and had higher levels of 
Alistipes, Faecalibacterium and Ruminococcaceae. Community type D 
had fewer Bacteroides than community types A and C, but had higher 
levels of Prevotella. Community type B had the fewest Bacteroides and 
was dominated by a variety of populations affiliated within the Firmi- 
cutes. Furthermore, the diversity of the samples assigned to each of the 
community types indicated that type A had a significantly lower diver- 
sity than the other three types (P < 0.001). Community types A, C and 
D resembled the previously identified Bacteroides, Ruminoccocus and 
Prevotella enterotypes, respectively’”®’’. Analysis of the other body sites 
yielded analogous patterns. 

Using the responses that subjects gave to an extensive survey (sum- 
marized in Extended Data Table 1), we identified demographic and life- 
history characteristics that could be correlated with different community 
types at each body site. Of the numerous characteristics tested, we observed 
significant associations between community types and whether the sub- 
ject was ever breastfed, their gender, and their education level (see Source 
Data associated with Fig. 1). Whether an individual was ever breastfed 
was strongly associated with their stool community type (P= 1X 10° * 
Fig. 1c). Individuals who had been breastfed at some point as infants 
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Figure 1 | Analysis of stool samples reveals four community types. a, Fitting 
the genera-level relative abundance data from 597 stool samples to Dirichlet 
multinomial mixture models provided support for four types when using the 
Laplace approximation to the negative log model evidence. b, The relative 
abundance of the most abundant genera in the samples assigned to each of the 
types (the boxes represent the interquartile range and the error bars represent 
the 95% confidence interval; n (community type A) = 221; n (community 
type B) = 15; n (community type C) = 80; n (community type D) = 281). 

c, d, There were significant associations between stool community types 

(n = 287 unique individuals) and whether the subject was breastfed as an infant 
(c; median P= 1 X 10 “*) and their gender (d; median P= 4 X 10“). 


were 2.4-times more likely to belong to community type A, and those 
who were not breastfed were 2.2-times more likely to belong to com- 
munity type D. Gender was associated with community types identified 
in the stool (P= 4 X 107+; Fig. 1d), tongue (P = 2 X 10 °; Extended 
Data Fig. 2a), right retroauricular crease (P = 9 X 10 °; Extended Data 
Fig, 2b), and right antecubital fossa (P = 3 X 10° °; Extended Data Fig. 2c). 
For example, men were 3.0-times more likely than women to harbour 
stool community type D (Fig. 1b). Whether a woman had a baccalau- 
reate degree had a strong association with the community types observed 
within the vaginal introitus (P = 2 107°; Extended Data Fig. 3a), mid 
vagina (P= 8 X 10 4; Extended Data Fig. 3B), and posterior fornix 
(P=4X 10 “*; Extended Data Fig. 3C). At each of these sites, women 
with a baccalaureate degree were more likely to be dominated by Lacto- 
bacillus (type E) and those without a baccalaureate degree were likely 
to have very low levels of Lactobacillus and moderate abundances of 
Atopobium, Prevotella, Bifidobacterium and unclassified members of 
the Firmicutes (type D). Together, our analysis indicates that an indi- 
vidual’s life-history characteristics can be associated with their micro- 
biome composition. 

The second important observation that we identified was that the com- 
munity type at one body site was predictive of the community type at 
another body site. Previously, cross-body site comparisons were made by 
calculating the ecological distance between samples collected at different 
body sites based on the taxonomic composition of those communities’. 
Our approach allowed us to identify similar associations within a body 
region (for example, oral, skin, vagina), but also allowed us to detect asso- 
ciations between communities that had very different taxonomic compo- 
sitions. Community type membership was correlated among sites within 
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the oral cavity, in the vagina, and between the left and right antecubital 
fossa and the left and right retroauricular crease (Fig. 2). Surprisingly, 
stool samples showed a significant association with samples from within 
the oral cavity; the strongest association was with the community types 
observed in saliva (P = 10 *; Extended Data Table 3). Saliva was domi- 
nated by members of the Prevotella, Streptococcus, Pasteurellaceae, 
Veillonella and Fusobacterium; among these taxa, only Prevotella were 
abundant in the stool communities. Individuals with stool community 
type D, which had the highest level of Prevotella, were 2.1-times more 
likely to harbour saliva community types A and C, which were also high 
in Prevotella relative to saliva community types B and D. Stool com- 
munity types A and C, which had low levels of Prevotella, were less likely 
to co-occur with saliva community types A and C (Extended Data Table 3). 
These results are intriguing because they suggest that although the oral 
and stool communities share little taxonomic resemblance, oral bac- 
terial populations seed the gut, and those populations experience the 
ecological environment of the gut to give rise to consistent community 
types by the time they reach the stool. 

Aside from life-history characteristics and inoculation from other 
body sites, the structure of the human microbiome is probably shaped 
by an individual’s recent interactions with their environment, diet, med- 
ications, and overall health. We quantified the stability of each community 
type at every body site by estimating the probability that the type would 
change between sampling visits (Fig. 3a). The most stable body sites 
were in the stool and vagina and the least stable site was the supragin- 
gival plaque. Among the four stool community types, type D was the 
most stable followed by types A, C and B (Fig. 3b). Unfortunately, the 
metadata describing changes in health or lifestyle are unable to provide 
us with an explanation for why community types change. 

The human microbiome is a complex ecosystem that varies consid- 
erably across the body and between individuals. This study demonstrates 
that given the myriad permutations of genetics, life histories, behaviours, 
environments and exposures, an individual’s microbiome is an emergent 
property whereby a potentially limitless number of microbial community 
structures can be distilled into a finite number of types. Knowledge of 
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Figure 2 | Community-type associations are strongest within a body region, 
but also exist between stool and the oral cavity. Heat-map colours represent 
the magnitude of the median P value for the comparison of community type 
membership using Fisher’s exact test. Median P values are found in the 
Source Data. 
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that community type stability is correlated with the diversity of the 
community type. a, The community types at each body site differ in the 
fraction of samples that change their community type membership between 
visits. (Size of circles represents percentage of samples that affiliated with each 


the factors that affect one’s community type profile will be critical as they 
continue to be associated with predisposition to diseases. Furthermore, 
understanding why community types change will be useful in devel- 
oping therapies that can alter one’s community type using pre- and 
probiotics, faecal transplants, or antibiotics. Given the varying levels of 
flux between community types at different body sites, it is remarkable 
that we were still able to detect life-long legacy effects on the micro- 
biome, such as whether the subject was ever breastfed as an infant. This 
result could represent a true long-term impact of breastfeeding on the 
microbiome or it could represent the effect of the individual’s childhood 
environment or care. The result raises the possibility that there may be 
other legacy effects on the microbiome, such as duration of breastfeed- 
ing, mode of birth, level of early antibiotic exposure, and childhood 
disease'**°. The four gender-based associations are intriguing and 
support previous studies showing that men and women have different 
skin communities” and that autoimmune diseases may be mediated 
via the microbiome and hormonal differences”. The association between 
one’s level of education and their vaginal microbiome type is less clear; it 
is most likely that a baccalaureate degree represents a composite variable 
of numerous factors known to affect the vaginal microbiome, including 
race/ethnicity, sexual behaviour and socioeconomic class. Regardless, 
that such considerable variation was observed among a population of 
healthy women supports the observation that there is no single normal 
vaginal microbiome”’; this is probably true for every body site. Look- 
ing forward, prospective studies that include individuals with varied 
levels of health and varied backgrounds (study groups that are more 
representative of society) are needed to achieve a better understanding 
of the mechanisms of change in community types as well as to provide 
more details about correlations between community type and life-history 
factors such as genetics, age, diet, health status, and environment (that is, 
rural or urban). Furthermore, future prospective studies with a longit- 
udinal component need to control for the time between samplings and 
perhaps synchronize sampling with host physiology (for example, 
menses). Perhaps most exciting is the prospect that community types 
may be associated with complex diseases such as bacterial vaginosis, 
periodontitis, cancer, and diabetes where it has not been possible to 
establish a causative relationship between a specific bacterium and the 
disease. 
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community type and the vertical line represents the weighted average.) b, Rate 
of change between stool community types (n (community type A) = 221; 

n (community type B) = 15; n (community type C) = 80; n (community type 
D) = 281). The numbers on directed edges indicate the percentage of samples 
that changed community types. 


METHODS SUMMARY 


The Human Microbiome Project carried out three phases of sequencing the 16S 
rRNA gene and we obtained the unprocessed data for the V35 region from the 
NCBI Short Read Archive (SRA): the Clinical Pilot Project (accession SRP002012), 
Phase I (accession SRP002395) and Phase II (accession SRP002860). The Clinical 
Pilot Project and Phase I data sets have been described previously”*. The metadata 
and clinical data associated with the samples from the subjects were obtained from 
dbGap (accession phs000228.v3.p1). The 16S rRNA gene sequence curation pipe- 
line was implemented using the mothur software package (http://nbviewer.ipython. 
org/gist/pschloss/9815766/notebook.ipynb)”**”*. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Sequence analysis pipeline. The Human Microbiome Project carried out three 
phases of sequencing the 16S rRNA gene, which were performed using the 454 
Titanium sequencing platform. We obtained the unprocessed sff files for the V35 
region from the NCBI Short Read Archive (SRA) for each of these phases: the Cli- 
nical Pilot Project (accession SRP002012), Phase I (accession SRP002395) and Phase II 
(accession SRP002860). The Clinical Pilot Project and Phase I data sets have been 
described previously*”. The sequencing was performed by sequencing from the 3’ 
to the 5’ end of the 16S rRNA gene”. Although the V13 and V69 regions were also 
sequenced by the HMP sequencing centres, the number of data sets generated for 
those regions was considerably smaller than was obtained for the V35 region. The 
16S rRNA gene sequence curation pipeline was implemented using the mothur soft- 
ware package****. This approach has been shown to result in a sequencing error 
rate of 0.02%». Briefly, flowgrams were extracted from the sff files and any that had 
more than one mismatch to the barcode, more than two mismatches to the primer, 
had fewer than 450 flows, contained homopolymers longer than 8 nucleotides, or 
contained an ambiguous base call were culled. The flows for each sequencing run 
were trimmed to 450 flows and de-noised separately using the PyroNoise algori- 
thm as implemented within mothur’’. The de-noised sequences were then aligned 
against a customized reference alignment based on the SILVA database using the 
NAST algorithm implemented within mothur”. The customized database included 
small subunit rRNA sequences from bacteria, archaea, eukarya, chloroplasts and 
mitochondria. Sequences that did not align to the predicted V35 region were culled 
from further analysis and the alignments were trimmed so that the sequences fully 
overlapped the same alignment coordinates”””’. These sequences were then sub- 
jected to a pre-clustering step that first sorted the sequences by their abundance 
within each sample and then clustered sequence abundances together ifa sequence 
was within 2 nucleotides of a more abundant sequence”. Treating each sample sep- 
arately, we interrogated each sequence for the presence of chimaeras using the 
de novo UChime chimaera detection algorithm’. Once chimaeric sequences were 
culled from the data sets, the sequences were classified using the naive Bayesian 
Classifier trained against a customized version of the RDP training set (version 9) 
as implemented within mothur*. The training set was customized by supplementing 
sequences derived from chloroplasts, mitochondria and members of the Eukarya. 
The reference sequences were trimmed to only include the V35 region of the 16S 
rRNA gene. We required a minimum classification confidence score of 80% and 
used 1,000 pseudo-bootstrap iterations. Because the PCR target was bacterial 16S 
rRNA gene sequences, we culled any sequences that classified as being derived 
from archaea, eukarya, mitochondria, chloroplasts or sequences that could not be 
classified to a kingdom with at least 80% confidence. The taxonomy of the remain- 
ing sequences was used to assign the sequences to genus-level phylotypes. Those 
sequences without a genus-level classification were assigned to a phylotype repre- 
sented by the lowest level taxonomy with a confidence score of at least 80%. This 
allowed us to create a table of counts for the number of times each genus-level phy- 
lotype was observed in each sample. As some samples were sequenced multiple 
times to obtain additional sequence data, we pooled replicate sequencing runs to 
create a single sample. Samples with fewer than 1,000 reads were removed from 
further analysis and all samples were either sub-sampled or rarefied (n = 1,000 iter- 
ations) to 1,000 reads to perform subsequent analyses. Sub-sampling and rarefac- 
tion were necessary to limit the effects of differential sampling that are known to 
affect alpha and beta diversity metrics and differentially increase the representation 
of PCR and sequencing artefacts in data sets. 

Assignment to community types. The table of counts was partitioned according 
to the 18 body sites. Because the communities were similar and we wanted to use the 
maximum number of samples per body site when assigning samples to community 
types, we pooled the three vaginal body sites (that is, vaginal introitus, mid-vagina, 
and posterior fornix) into one vaginal data set, the two antecubital fossa sites (that is, 
left and right) into one antecubital fossa data set, and the two retroauricular crease 
sites (that is, left and right) into one retroauricular crease data set. The resulting 14 
tables were used as input to partition the samples according to community types at 
each body site using the Dirichlet multinomial mixture model*. We selected the 
number of community types at each body site by selecting the number of compo- 
nents that gave the minimum Laplace approximation to the negative log model 
evidence. Samples were assigned to their community type based on the maximum 
posterior probability. For all body sites, between 89.2% and 99.7% of the samples 
had a posterior probability of at least 0.90. The mean abundance and 95% con- 
fidence interval predicted by the model are provided for each body site. 
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Selection of metadata. A large amount of metadata and clinical data were col- 
lected for each of the samples and subjects’. We obtained the most recent version 
of these data from dbGap (accession phs000228.v3.p1). Because of the uniformity 
and healthy nature of the cohort, a number of the clinical data fields could not be 
included in our analysis. Furthermore, there was evidence that several variables 
were collected from subjects in one city but not the other (see Supplementary Infor- 
mation for a discussion of the difficulties in analysing the city of origin data). We 
interrogated, a priori, the categorical metadata to identify those variables where we 
were able to identify at least 10 instances of the condition and that was represented 
in the subjects from both cities. In addition, to increase the number of variables 
under consideration, we pooled responses. For example, there were 13 categories 
for country of birth with only one of those having more than 10 respondents (US/ 
Canada; n = 260). In this case we pooled the other responses to create a non-US/ 
Canada group (n = 40). We used a similar pooling strategy for parents’ country of 
origin, meat eaters/vegetarians, number of children the subject had given birth to, 
occupation, and level of education. The data available through dbGap partitioned 
medications into broad categories and indicated whether the subject was using the 
medication at the time of the visit. This created three classes of subjects. The first 
never used the medication during the study, the second class used the medication 
for one or two of the visits, but not all of their samples, and the third class used the 
medication for all of their visits. For the purpose of correlating medication usage 
with community type, we used the data for the first and third classes of subjects and 
ignored the second. The number of subjects in the second class was below 10 for 
each type of medication. For example, there were only 8 subjects that had more 
than one visit and used antibiotics within 30 days of any of their visits. Because of 
the general paucity of subjects in this category of medication users, it was not pos- 
sible to associate medication usage with changes in community type. Finally, we 
converted the subjects’ body mass index (BMI) into categories of normal (18.5-25), 
overweight (25-30) and obese (>30); there were no underweight subjects (<18.5). 
The resulting list of categorical clinical data that were considered is provided in 
Extended Data Table 1. In addition to these categorical data, we also had access to 
continuous clinical metadata for each of the subjects. These included their age, 
BMI, pulse and blood pressure. A summary of these data is provided in Extended 
Data Table 1. 
Tests of association. Because individuals provided up to three samples it would 
have been arbitrary to select one visit from each subject (for example, the first visit) 
on which to base our analyses. For the categorical metadata associations, we per- 
formed an iterative procedure where we selected a single visit for each individual’s 
body site and tested the association between the community type at the body site 
and the metadata using Fisher’s exact test. For the continuous metadata we performed 
a similar procedure except we tested the association between the community type 
at the body site and the metadata using analysis of variance. Finally, to test associa- 
tions across the body we performed a randomization procedure where each itera- 
tion consisted of selecting one visit for each individual and then testing for inter-body 
site associations using the Fisher exact test. We performed 1,000 iterations and we 
calculated the percentage of iterations that each variable was significant according 
to the Benjamini-Hochberg step-up procedure that we used to limit the false dis- 
covery rate to 5%. We report the median P value and the percentage of iterations 
that were significant. 

A complete description of the analysis pipeline including scripts, mothur commands, 
and intermediate files are available at http://nbviewer.ipython.org/gist/pschloss/ 
9815766/notebook.ipynb. 
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Extended Data Figure 2 | The frequency of community types for body sites _retroauricular crease (b; 1 = 268 unique individuals; median P = 9 X 10°) 
where there was a significant association with the subject’s gender. and right antecubital fossa community types (c; n = 136 unique individuals; 
a-c, Percentage of female and male tongue communities that affiliated with median P= 3 X 10°). 
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©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


C 
60 
40 

50 
#30 o 0 
= a &. 
a a Q 40 
= € & 
oO Oo oO 
no 2) ao 
xe} ro) xe} 
S S 3 30 
£ 20 £ £ 
5 5 8 
2 2 2 
2 @ 04 & 

20 

10 
104 
i 
A B Cc D E A B Cc D E A B Cc D E 
Vaginal introitus Community Type Mid vagina Community Type Posterior fornix Community Type 


Extended Data Figure 3 | The frequency of vaginal community typesamong _ vaginal introitus (a; n = 74 unique individuals; median P = 2 X 10 >), 
women with and without a college degree. a-c, Percentage of women with —_ mid-vagina (b; n = 64 unique individuals; median P = 8 X 10 *) and posterior 
and without a college degree whose vaginal communities affiliated with the fornix (c; n = 61 unique individuals; median P = 4 X 10 *) community types. 
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Categorical data 


Sampled in Houston / St. Louis 
Female / Male 

Born in US or Canada 

Mother born in US or Canada 
Father born in US or Canada 
Hispanic, Latino, or Spanish 
Asian 

Black 

White 

Ever breastfed as infant 

Eats meat at least once a week 
Occupation: Student 

Had given birth at least once 
College educated 

Had dental insurance 

Had health insurance 

Tobacco user 

Chronic use of antidepressants 
Chronic use of antihistamines 
Chronic use of hormonal contraceptives 
Chronic use of vitamins or supplements 
Normal BMI 

Overweight BMI 

Obese BMI 


Continuous data 


Age 

BMI 

Pulse 

Diastolic pressure 
Systolic pressure 

pH — Vaginal introitus 
pH — Posterior fornix 


Extended Data Table 1 | Most common characteristics of the individuals included in the HMP healthy cohort 


Number of 
Individuals (Total=300) 
150 / 150 


198 (Forgot=33, NA=23) 
261 (NA=23) 


72 (Tr=15; NA=149) 
34 (Tr=10) 


Median (Min-Max) 


25 (18-40) 
24 (19-34) 
71 (42-100) 
71 (50-98) 
119 (91-151) 
4.4 (3.3-6.5) 
4.0 (3.2-7.0) 


NA, data was not collected; Forgot, the subject could not recall; Tr, the medication was used transiently throughout the two or three sampling events. 
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Extended Data Table 2 | Comparison of PAM- and DMM-based approaches to assigning samples to community types 


Body site 
Antecubital fossa 
Anterior nares 
Buccal mucosa 

Hard palate 
Keratinized gingiva 
Palatine tonsils 
Retroauricular crease 
Saliva 

Stool 

Subgingival plaque 
Supragingival plaque 
Throat 

Tongue dorsum 
Vagina 


PAM-based using SI Index 


SI Index 


0.34 
0.32 
0.23 
0.28 
0.38 
0.27 
0.51 
0.19 
0.40 
0.25 
0.24 
0.26 
0.33 
0.57 


Clusters 
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Laplace 


84858.4 
52136.3 
65643.3 
72686.9 
51392.2 
82655.3 
95797.5 
81261.3 
76228.5 
90876.7 
78982.0 
79238.0 
71442.3 
32407.3 


PAM-based using CH Index 


Clusters 


MNNNMNNNNNWONMNA AND PD 


CH Index 


114.3 
153.5 
166.2 
208.4 
323.8 
237.7 
719.9 
120.4 
194.0 
249.4 
217.6 
177.8 
293.2 
205.7 


Laplace 


84858.4 
51864.3 
64968.1 
71573.8 
51392.2 
82655.3 
95797.5 
81261.3 
76228.5 
90876.7 
78982.0 
79238.0 
71442.3 
32150.8 


DMM-based 


Clusters 


ANANDA ATBDTAR KNW 


Laplace 


83302.1 
51532.0 
64588.8 
71436.9 
50605.3 
81446.7 
94673.5 
80656. 1 
74785.6 
89672.2 
78357.1 
78052.8 
69923.0 
31209.5 


Extended Data Table 3 | Average contingency table of stool and saliva community types 


Saliva A Saliva B Saliva C 
Stool A 0.101 0.140 0.104 
Stool B 0.003 0.000 0.002 
Stool C 0.136 0.173 0.107 
Stool D 0.048 0.024 0.052 


The median P value from a Fisher's exact test was 1 x 1073. 
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Saliva D 
0.044 
0.021 
0.027 
0.017 
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T-cell activation by transitory neo-antigens derived 
from distinct microbial pathways 


Alexandra J. Corbett!*, Sidonia B. G. Eckle!*, Richard W. Birkinshaw*, Ligong Liu®*, Onisha Patel’, Jennifer Mahony‘, 
Zhenjun Chen!, Rangsima Reantragoon’, Bronwyn Meehan', Hanwei Cao’, Nicholas A. Williamson’, Richard A. Strugnell’, 
Douwe Van Sinderen*®, Jeffrey Y. W. Mak°, David P. Fairlie*’*, Lars Kjer-Nielsen'*, Jamie Rossjohn”*?* & James McCluskey!* 


T cells discriminate between foreign and host molecules by recog- 
nizing distinct microbial molecules, predominantly peptides and 
lipids'*. Riboflavin precursors found in many bacteria and yeast also 
selectively activate mucosal-associated invariant T (MAIT) cells**, an 
abundant population of innate-like T cells in humans’°. However, 
the genesis of these small organic molecules and their mode of pre- 
sentation to MAIT cells by the major histocompatibility complex 
(MHC)-related protein MR1 (ref. 8) are not well understood. Here 
we show that MAIT-cell activation requires key genes encoding enzymes 
that form 5-amino-6-D-ribitylaminouracil (5-A-RU), an early inter- 
mediate in bacterial riboflavin synthesis. Although 5-A-RU does not 
bind MR1 or activate MAIT cells directly, it does form potent MAIT- 
activating antigens via non-enzymatic reactions with small mole- 
cules, such as glyoxal and methylglyoxal, which are derived from 
other metabolic pathways. The MAIT antigens formed by the reac- 
tions between 5-A-RU and glyoxal/methylglyoxal were simple adducts, 
5-(2-oxoethylideneamino)-6-D-ribitylaminouracil (5-OE-RU) and 5- 
(2-oxopropylideneamino)-6-D-ribitylaminouracil (5-OP-RU), respec- 
tively, which bound to MR1 as shown by crystal structures of MAIT 
TCR ternary complexes. Although 5-OP-RU and 5-OE-RU are unsta- 
ble intermediates, they became trapped by MRI as reversible cova- 
lent Schiff base complexes. Mass spectra supported the capture by 
MRI of 5-OP-RU and 5-OE-RU from bacterial cultures that activate 
MAIT cells, but not from non-activating bacteria, indicating that 
these MAIT antigens are present in a range of microbes. Thus, MR1 
is able to capture, stabilize and present chemically unstable pyrimi- 
dine intermediates, which otherwise convert to lumazines, as potent 
antigens to MAIT cells. These pyrimidine adducts are microbial sig- 
natures for MAIT-cell immunosurveillance. 

MAIT-cell antigens were previously identified from Salmonella typh- 
imurium (strain SL1344) supernatant’. Negative-mode electrospray 
ionization-time-of-flight mass spectrometry (ESI-TOF-MS) analysis 
of MR1-bound ligands from S. typhimurium revealed a ligand with a 
mass to charge (m/z) ratio of 329.11, matching a potent MAIT-activating 
ligand identified during the chemical synthesis of reduced 6-hydro- 
xymethyl-8-p-ribityllumazine (rRL-6-CH,OH) (ref. 5). Although this 
ligand was identified biochemically, its origin was puzzling, as it is not 
described in the riboflavin synthesis pathway. We therefore took a genetic 
approach to evaluate whether the riboflavin pathway supplied the MAIT- 
cell ligands. 

We examined the capacity of bacterial mutants of the riboflavin path- 
way to activate MAIT cells (Fig. 1a). In some bacterial species, includ- 
ing Lactococcus lactis, the genes necessary for riboflavin synthesis 
are grouped together in a single four-gene operon (RibGBAH), and 
are regulated by transcriptional repression of a ‘riboswitch’ via flavin 


mononucleotide and riboflavin’®. Using L. lactis, we tested the ability 
of bacterial culture supernatant to activate Jurkat cells transduced with 
a MAIT T-cell antigen receptor (TCR) (Jurkat.MAIT) (Fig. 1b). Superna- 
tant from wild-type L. lactis strain NZ9000 incubated with antigen- 
presenting cells expressing MR1 caused CD69 upregulation in Jurkat. 
MAIT cells (Fig. 1b). Addition of riboflavin during culture of NZ9000 
abrogated Jurkat.MAIT-cell activation, consistent with negative regu- 
lation of the riboswitch and impaired production of the activating MAIT 
ligand (Fig. 1b). Next, three mutant strains of L. lactis were used: two 
riboflavin overproducers, CB013 and CB021, which produce riboflavin 
even in the presence of high riboflavin concentrations; and a RibA 
strain, which contains a deletion in ribA, early in the riboflavin path- 
way"’. The riboflavin overproducers activated Jurkat.MAIT cells when 
grown with or without exogenous riboflavin, whereas there was no MAITT- 
cell activation by supernatant from the RibA “ strain (Fig. 1b). Accord- 
ingly, the riboflavin pathway is necessary and sufficient to produce 
natural MAIT-cell antigens. 

Next we generated individual mutations in the four genes of the ribo- 
flavin operon in L. lactis. These were produced using the constitutive 
riboflavin overproducer strain CB013. Culture supernatants from bac- 
teria with mutant riboflavin pathways were tested for activation of 
Jurkat.MAIT cells (Fig. 1c). The parental CB013 supernatant activated 
Jurkat.MAIT cells, whereas bacteria containing mutations in ribA or 
ribG did not activate the reporter cells under similar conditions (Fig. 1c). 
Neither ribB nor ribH mutations, which affect the pathway downstream 
of 5-A-RU, had any impact on Jurkat.MAIT activation (Fig. 1c). More- 
over, whereas the m/z 329.11 species was undetectable in MR1 refolded 
with supernatant from the RibA and RibG mutants, it was captured by 
MRI from the supernatants of the RibB and RibH mutants of L. lactis 
(Extended Data Fig. 1a). Moreover, culture supernatants from S. typhi- 
murium SL1344 with mutated RibD and RibH (RibD+ H) did not fur- 
nish a detectable m/z 329.11 species that bound MRI, and could not 
activate Jurkat.MAIT cells (Extended Data Fig. 1b, c). However, anti- 
gen was detectable in the complemented, activating RibD+H SL1344 
mutant (Extended Data Fig. 1b, c). MR1-restricted ligands were not 
detected from the supernatant of Enterococcus faecalis, which neither 
possesses the riboflavin pathway nor activates MAIT cells (Extended 
Data Fig. 1d and data not shown). Analysis of MR1-bound ligands 
from another MAIT-activating strain, Escherichia coli (DH5« strain) 
also revealed a ligand with an m/z ratio of 329.11 (Extended Data Fig. le). 
These data are consistent with MAIT-activating ligands, from a num- 
ber of bacterial sources, being derived through an unknown mecha- 
nism from 5-A-RU. 

A key precursor step in riboflavin biosynthesis is the condensation 
of 5-A-RU (1) with 3,4-dihydroxy-2-butanone-4-phosphate (2a) to 
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6,7-Dimethyl-8-p-ribityllumazine 


Figure 1 | The riboflavin pathway furnishes ligands that activate MAIT 
cells. a, Riboflavin biosynthesis pathway. ribH, lumazine synthase; 

X, hypothetical phosphatase. b, Cells were incubated overnight with filtered 
supernatant S/N from L. lactis NZ9000 (wild type), RibA” (RibA deletion 
mutation), CB013 and CBO021 (riboflavin overproducers) overnight cultures 
plus or minus 3 pg ml | riboflavin, then stained with anti-CD3 monoclonal 
antibody coupled to PE and anti-CD69 monoclonal antibody coupled to APC. 
Mean fluorescence intensity (MFI) anti-CD69-APC for gated Jurkat. MAIT 


cells. Data are shown as mean + standard error of the mean (s.e.m.). c, Cells 
were incubated overnight with 10 pl filtered, culture S/N from Lactococcus 
CBO013 (deregulated riboflavin operon; wild type (WT)), CBO13ARibA, 
CBO013ARibB, CBO13ARibG and CB013ARibH or S. typhimurium, then stained 


with anti-CD3-PE and anti-CD69-APC. MFI anti-CD69-APC for gated 
Jurkat.MAIT cells. Data are shown as mean + s.e.m. Experiments were 


performed at least three times. 
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5-OP-RU (3d), respectively, and then 
RL-6,7-DiMe (4b = 4a), RL (4c) and 
RL-7-Me (4d), respectively. b, Two- 
dimensional NMR spectrum 
(heteronuclear multiple bond 
correlation (HMBC)) of isolated 
5-OP-RU (3d) in DMSO-d, showing 


p.p.m. key © long-range correlations 
that unambiguously characterize the 
| 20 imine adduct (3d), also identified in 
aqueous media (pH > 6). J refers to 
p40 heteronuclear coupling through 1, 2 
| 60 or 3 bonds. 
+ 80 
100 
+ 120 
140 
- 160 
"| 480 
24 |200 
T 
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RL-6-Me-7-OH 


Figure 3 | Structural basis of lees and recognition of transitory 
MAIT-cell antigens. a—c, MAIT TCR-MR1-antigen docking (a), MAIT TCR 
footprint on MR1 surface (b) and 5-OP-RU and RL-6-Me-7-OH overlay 

(c). d-h, MR1 contacting 5-OP-RU (d) and 5-OE-RU (e); MAIT TCR 
contacting RL-6-Me-7-OH (f), 5-OP-RU (g) or 5-OE-RU (h). 


generate an intermediate 5-(1-methyl-2-oxopropylideneamino)-6-D- 
ribitylaminouracil (5-MOP-RU; 3a), which readily undergoes ring 
closure with dehydration to form 6,7-dimethyl-8-p-ribityllumazine 
(RL-6,7-DiMe; 4a)'*"? (Fig. 2a), a biosynthesis that is catalysed by lum- 
azine synthase (RibH). However, RL-6,7-DiMe can also be generated 
in the absence of lumazine synthase’*™, suggesting that MAIT antigens 
might be formed through spontaneous reactions of 5-A-RU with other 
small molecules through non-enzymatic mechanisms (Fig. 2a). For exam- 
ple, butane-2,3-dione (2b), glyoxal (2c) and methylglyoxal (also known 
as pyruvaldehyde; 2d) can represent by-products arising from a num- 
ber of metabolic pathways, including glycolysis'*. Their condensations 
with 5-A-RU (1) would respectively produce the pyrimidine adducts 
5-MOP-RU (3b = 3a), 5-OE-RU (3c) and 5-OP-RU (3d) en route to 
ribityllumazines RL-6,7-DiMe (4b = 4a), 8-D-ribityllumazine (RL; 4c) 
and 7-methyl-8-p-ribityllumazine (RL-7-Me; 4d), respectively. We found 
that the initial adducts 3b-d were formed almost immediately, but readily 
underwent dehydration upon ring closure to form the stable, isolatable 
compounds 4b-d (Extended Data Figs 7 and 8), without the need for 
enzyme catalysis. Adducts 3b-d (Fig. 2a) were especially unstable under 
acidic aqueous conditions (pH < 6), but we could detect them in solu- 
tion under physiological conditions. We were able to synthesize 3d 
in deuterated dimethylsulphoxide (DMSO-dg), isolate it, unambigu- 
ously assign its solution structure by NMR spectroscopy (Fig. 2b and 
Extended Data Fig. 2), and examine its stability in aqueous media using 
liquid chromatography-mass spectrometry (LC-MS). At 37 °C and 
pH 6.8, adduct 3d was clearly formed and had a half-life (t;;.) of around 
2h at 65 uM. It was more stable at lower temperatures (for example, 
thj2 14-15 hat 15 °C, pH 6.8-8.0, 65-250 tM) (Extended Data Fig. 3). 
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MRI is depicted in grey, #-chain in purple, B-chain in cyan. 5-OP-RU is shown 
in green, 5-OE-RU in yellow, RL-6-Me-7-OH in magenta, CDR1« in slate, 
CDR2a in pink, CDR3e in yellow, CDR1f in teal, CDR26 in red and CDR3B 
in orange. 


We considered whether butane-2,3-dione (2b), glyoxal (2c) and meth- 
ylglyoxal (2d), in the presence of 5-A-RU (1), might react spontaneously 
to facilitate MRI refolding. MR1-5-A-RU was undetectable when MR1 
was refolded with 5-A-RU alone (data not shown), and the presence of 
5-A-RU and butane-2,3-dione failed to yield any MR1-antigen com- 
plexes (data not shown). However, refolding of MR1 in the presence of 
5-A-RU and either glyoxal or methylglyoxal led to a correctly folded 
MRI-antigen complex. To understand the basis for ligand selectivity 
by MRI, we determined the structures of the MAIT TCR in complex 
with MRI and antigens formed from the condensation of 5-A-RU and 
either methylglyoxal or glyoxal’® (Fig. 3a—h, Extended Data Table 1 and 
Extended Data Fig. 4). Surprisingly, the chemically unstable adducts 5- 
OP-RU (compound 3d; Fig. 2a) and 5-OE-RU (compound 3c, Fig. 2a) 
were observed bound to MR1 (Extended Data Fig. 4 and Fig. 3d, e). 
Both of these one-ring (pyrimidine) compounds were thus captured 
by MRI, despite being relatively unstable in the absence of MR1 and 
readily undergoing dehydrative cyclization to compounds 4d and 4c, 
respectively. The aromatic pyrimidine ring systems of 5-OP-RU and 
5-OE-RU superposed on the corresponding ring from the bicyclic lumazine 
7-hydroxy-6-methyl-8-p-ribityllumazine (RL-6-Me-7-OH) (ref. 16) 
(Fig. 3c). The creation of 5-OP-RU or 5-OE-RU generated an aliphatic 
moiety that burrowed into the MR1 cleft, within which the residual car- 
bonyl group formed a Schiff base with Lys 43 of MRI (Fig. 3d, e). This 
aliphatic moiety was also stabilized in the cleft by interactions with Tyr 7 
and Tyr 62 (Fig. 3d, e). By contrast, RL-6-Me-7-OH was non-covalently 
bound within MR1 (Fig. 3f). Moreover, RL-6-Me-7-OH does not have 
the propensity to tautomerize into a single-ring pyrimidine system owing 
to its ability to form a very stable amide tautomer. Nevertheless, the ribityl 
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moieties of 5-OP-RU, 5-OE-RU and RL-6-Me-7-OH were all located 
in essentially identical positions within their respective complexes, each 
forming a hydrogen bond to Tyr 95a of the MAIT TCR" (Fig. 3f-h). 
Notably, 5-OP-RU and 5-OE-RU are relatively unstable in aqueous media 
and thus MRI can capture and stabilize pyrimidine intermediates in 
the synthesis of lumazines. 

Next we undertook ESI-TOF-MS to identify independently the chem- 
ical composition of the ligands captured within these refolded MR1- 
antigen complexes. For MRI] refolded with 5-A-RU and methylglyoxal, 
a single peak with a retention time of 8.9 min and m/z 329.11 matched 
a species that was captured by MRI] from Salmonella supernatant’ and 
from the reaction mixture during synthesis of rRL-6-CH,OH (ref. 5) 
(Extended Data Fig. 5a, b). This finding is consistent with the identifi- 
cation, within the crystal structure with MR1, of 5-OP-RU independently 
assembled from 5-A-RU and methylglyoxal (Fig. 3d), and supported by 
the NMR and kinetic characterization of 5-OP-RU in solution (Fig. 2b 
and Extended Data Figs 2, 3). Similarly, mass spectrometric analysis of 
MRI refolded with the mixture of 5-A-RU and glyoxal revealed precursor 
and product m/z values (Extended Data Fig. 5a, b) consistent with iden- 
tification of 5-OE-RU within the crystal structure of MRI refolded with 
5-A-RU and glyoxal (Fig. 3e, h). Furthermore, mass spectrometric ana- 
lysis of MRI refolded with 5-A-RU and '°C-labelled glycolaldehyde 


yielded the expected m/z 317.10 precursor and 179.04 product ions, in 
agreement with the m/z 315.09 precursor and 177.04 product ions 
identified in MR1 refolded with 5-A-RU and glyoxal (Extended Data 
Fig. 5a, b). 

We asked whether the activity with synthetic rRL-6-CH2OH (ref. 5) 
might reflect capture by MRI of a synthetic intermediate. The ligand 
captured by the mutant MR1(K43A) exposed to the reaction mixture 
generating rRL-6-CH,OH (ref. 5) was identical, as identified by LC-MS 
and MS/MS analysis (m/z 329.11), to the MR1-bound antigen from 
either the Salmonella supernatant or derived from 5-A-RU/methyl- 
glyoxal condensation (data not shown, Extended Data Fig. 5a). We 
therefore evaluated whether the respective MRI tetramers formed from 
these distinct synthetic antigens were similar functionally. MR1-5-OP- 
RU and MR1-5-OE-RU tetramers efficiently stained all human MAIT 
cells present in peripheral blood mononuclear cells (PBMCs), similarly 
to the mutant MR1(K434A) tetramers’ (Fig. 4a, b). We solved the crystal 
structure of the MAIT TCR-MR1(K43A)-antigen complex, which 
revealed 5-OP-RU as the ligand bound to MR1(K43A), indicating that 
MRI captures an intermediate from the synthesis of rRL-6-CH,OH 
(Extended Data Fig. 4 and Extended Data Table 1). Thus, active MAIT- 
cell ligands are intermediary, open-ring precursors to ribityllumazines 
that arise from condensing 5-A-RU with small molecule metabolites. 
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Figure 4 | MR1-antigen tetramers and MAIT activation. a, Gating strategy 
(left), tetramers of MR1-6-FP, MR1-5-OP-RU, MR1-5-OE-RU or anti- 
TRAV1-2. DN, double negative. b, PBMC co-staining with MR1-5-OP-RU 
and MR1(K43A)-5-OP-RU tetramers. WT, wild type. c, Extracted ion 
chromatograms (EICs) of m/z 315.09 or m/z 317.10. d, Left, activation 

assay and, right, MR1 upregulation with 5-A-RU, methylglyoxal (MG), 


364 | NATURE | VOL 509 | 15 MAY 2014 


butane-2,3-dione (BD), glyoxal (G) or rRL-6-CH,OH. MFI of monoclonal 
antibody 26.5-PE staining. e, Left, CD69 upregulation and, right, MR1 
upregulation with 5-A-RU, MG, BD or G. MFI of 26.5-PE. f, Activation of 
wild-type, Y95A- or Y95F-mutant SKW.MAIT cells by 5-A-RU. Data are 
shown as mean of triplicates plus s.e.m. Experiments were performed at least 
twice (b, c, f) or three times (a, d, e). 
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Recombinant MR1 refolded in the presence of folate-deficient cul- 
ture supernatant from S. typhimurium SL1344 captured a dominant 
species of m/z 329.11 (ref. 5). Mass spectrometry of MRI refolded with 
supernatant from E. coli (DH5«) also revealed a distinct and abundant 
m/z 315.09 species with matching liquid chromatography retention time, 
mass spectrometry and tandem mass spectrometry (MS/MS) properties 
to those observed with MR1-5-OE-RU (Extended Data Fig. 6 and data 
not shown). Closer analysis of the MRI eluate from S. typhimurium also 
revealed the presence of an m/z 315.09 species, although this ligand was 
much less prevalent (data not shown). Accordingly, bacteria with an active 
riboflavin pathway can produce distinct MAIT-activating ligands, the 
relative abundance of which is dependent upon the bacterial source. 

Next, to establish if bacteria also produced free 5-A-RU, 13C_ labelled 
glycolaldehyde was added to E. coli supernatant, which was subsequently 
refolded with MRI. We detected a species with an m/z of 317.1 from these 
MRI eluates (Fig. 4c), consistent with the m/z 317.1 species observed 
previously (Extended Data Fig. 5). This indicates that there is sufficient 
free 5-A-RU released by bacteria to conjugate with exogenously added 
metabolites. Potent MAIT-cell antigens could also potentially be gen- 
erated by host-derived metabolites forming adducts with 5-A-RU, ina 
manner somewhat analogous to the genesis ofa CD1b-restricted antigen”. 
To test this, we added 5-A-RU to C1R cells transduced with MR1 (C1R. 
MR1), which led to MRI cell surface upregulation and activation of 
Jurkat.MAIT cells (Fig. 4d, e). When exogenous glyoxal or methylgly- 
oxal were added with 5-A-RU to C1R.MRI cells, we observed a further 
increase in MR1 upregulation and an increase in Jurkat.MAIT activa- 
tion, when compared with 5-A-RU added by itself (Fig. 4d, e). Notably, 
MRI surface expression was not enhanced, nor was there an increase 
in Jurkat.MAIT activation, upon co-addition of butane-2,3-dione with 
5-A-RU (Fig. 4d, e). These observations suggest that MR1-antigen com- 
plexes created from 5-A-RU and glyoxal or methylglyoxal are natively 
conformed. To test this, the mutation of Tyr 95« to either Ala 95 or Phe 95 
ablated recognition of C1R cells to which 5-A-RU had been added, ina 
manner similar to that observed when synthetic rRL-6-CH,OH was 
added to C1R cells!®, consistent with the notion that 5-A-RU is con- 
verted to 5-OE-RU or 5-OP-RU within C1R antigen-presenting cells 
(Fig. 4f). Accordingly, the bacterial riboflavin metabolite 5-A-RU can 
interact with host-derived metabolites analogous functionally to the 
creation of MRI ligands found in bacterial supernatant. 

Weshowa unique mechanism for creating T-cell ligands from dispa- 
rate metabolite building blocks. The potent MAIT-activating ligands 
arise from a ‘core precursor element’ of the microbial riboflavin path- 
way that forms simple adducts with distinct chemical metabolites, via a 
mechanism that does not require enzymatic catalysis. Thus, MR1 cap- 
tures, stabilizes and presents otherwise transitory chemical intermediates 
for MAIT-cell recognition. This represents a sophisticated discriminatory 
mechanism for targeting microbial antigens and protecting the host, 
whereby distinct metabolic pathways converge to produce T-cell antigens. 


METHODS SUMMARY 


L. lactis strains NZ9000 (wild type), the NZ9000 RibA™ deletion mutant, and the 
CBO013 and CBO021 roseoflavin resistant mutants have been previously described’. 
S. typhimurium strains SL1344 and BRD509 have been previously described’*. See 
Methods for full details of CB013 derivatives and Salmonella AribDH mutants. For 
details on chemical syntheses, characterization and stability, see Methods, Extended 
Data Figs 2, 3, 7, 8 and Supplementary Information. MAIT-activation and MR1- 
upregulations assays, MR1 refolding, and mass spectrometry analysis were conducted 
as described previously’. MR1(K43A) tetramers were generated as described’. See 
Methods for MR1-5-OP-RU and MR1-5-OE-RU tetramers. MRI tetramer staining 
was conducted as described previously*. The MAIT TCR in complex with MR1-5- 
OP-RU, MR1-5-OE-RU and MR1(K43A)-5-OP-RU were crystallized and the struc- 
tures were determined, as described in Methods. 
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METHODS 

Bacterial strains and mutants. L. lactis strains NZ9000 (wild type), the NZ9000 
RibA deletion mutant, and the CB013 and CB021 roseoflavin resistant mutants have 
been previously described". The CB013 derivatives CB013ARibA, CBO13ARibB, 
CB013ARibG and CB013ARibH were generated by insertion of EcoRI or EcoRV 
restriction sites incorporating either one or two stop codons into the individual 
genes using standard techniques. Inserted sequences were as follows (indicated in 
uppercase). ARibG, attaacgtttccccctecttttcgagccagtGAATT Caggattgctaaattcataaa 
atgctcatcattttccat; ARibB, gatgctggaagctcgaatgattaatttagacgGAATTCATT Aacttat 
ctctcttttgaatttgagttacctctcctat; ARibA tatcttttctggactaatcatttcggctgcacGAATTCaa 
atcagatctccttcattttctctattctcatcatc, ARibH, ggcccctgctaagagttttgcegttgatgaaGATAT 
CTTATTAttcgttaaaacgtgcaactacaattccatatttgg. 

The genotype of each mutant was verified by sequencing and multiple rounds of 
PCR based on the mutated region to verify the purity of the genotype. The pheno- 
type of each mutant was also checked using growth or absence of growth in ribo- 
flavin assay medium overnight at 30 °C and comparing with a control of CB013. 
All mutants were unable to grow in the media whereas CB013 was capable of growth 
in riboflavin-limiting conditions as it is an over-producer. L. lactis strains were 
grown at 30 °C without shaking in M17 medium (Difco) containing 1% glucose 
and the addition of 3 1g ml riboflavin where indicated. S. typhimurium SL1344 
was grown at 37 °C without shaking in M17 medium (Difco) containing 1% glucose. 

S. typhimurium strains SL1344 and BRD509 have been previously described’*. 
The Salmonella AribDH mutants were constructed on an SL1344 background by 
lamda-red-recombinase-mediated allelic replacement followed by general transduc- 
tion using phage P22 as previously described”, resulting in strain SL1344ARibDH. 

Primers were as follows. B2(Sec)F, 5’-TAGGGATAACAGGGTAAT-GGTTC 
GATAGCGTAATGG; B2(Sec)R, 5’-TAGGGATAACAGGGTAAT-TATCTTTC 
CGGCCTGTGA; B2(Kan)F, 5’-CTAAGGAGGATATTCATATG-GACCGCGC 
TTGAAATGAT; B2(Kan)R, 5’-GAAGCAGCTCCAGCCTACACA-ATTGTTA 
ACAATGACACA. 

The complement of mutants was performed by transformation of ribDH genes, 
resulting in strain SL1344ARibDH:RibDH. Mutation and reconstitution were 
verified by lack of growth or growth on Luria agar, and by PCR. Mutants were grown 
on Luria agar containing 20 pig ml * riboflavin. 

For MRI refolds, Salmonella wild-type and mutant strains were grown in M9 min- 

imal media supplemented with histidine (77.6 1g ml’) and streptomycin (25 pg ml‘) 
and 3 jg ml riboflavin. E. faecalis was grown in Folic Acid Assay Medium (Difco) 
at 37 °C without agitation. E. coli DH5 was grown in M9 media. L. lactis CB013 
and CB013 riboflavin mutants were grown in Folic Acid Assay Medium (Difco) sup- 
plemented with xanthine (6 1g ml ~ ') and Yeast Nitrogen Base (6.8 mg ml~ ") at 30°C 
without agitation. 
Compounds. Glyoxal, methylglyoxal, 1,3-dihydroxyacetone dimer, DL-glyceral- 
dehyde and butane-2,3-dione were purchased from Sigma. [1,2-'*C,]glycolaldehyde 
(glycolaldehyde can be readily air-oxidized to form glyoxal) was purchased from 
Omicron Biochemicals. A synthesis of rRL-6-CH,OH has been previously described’. 
5-A-RU was freshly prepared from 5-nitroso-6-p-ribitylaminouracil following a 
previously described procedure”. In brief, 5-nitroso-6-D-ribitylaminouracil (40.0 mg, 
0.138 mmol, 1 eq) was dissolved in MilliQ water (3 ml) at 80 °C under argon. To the 
red solution was added sodium dithionite powder (1.2-3.3 eq). The colour changed 
instantly to pale yellow. After stirring at 80 °C for 5 min, the solution was cooled 
under argon in an ice-water bath. For biological studies, the chilled solution was 
diluted with MilliQ water to make 50 mM stock solutions and stored in 1.5 ml aliquots 
at —20°C for later use. 

For NMR characterization of the pyrimidine intermediate 5-OP-RU (3d), a 
freshly prepared solution of 5-A-RU (1) was adjusted to pH 7.0 with 1 M sodium 
hydroxide solution, lyophilized, dissolved in DMSO-d, and then filtered to remove 
salts. The solution was transferred to an NMR tube, filled with argon, and the con- 
centration of 5-A-RU was determined by NMR spectroscopy. Methylglyoxal (2 eq) 
was added, and the reaction monitored by NMR. Upon completion, 5-OP-RU was 
further purified using a Shimadzu preparative HPLC system equipped with a Phe- 
nomenex Luna 10pm C18 250 X 21.20 mm column (P/No 00G-4253-PO-AX) 
and a SPD-M20A diode array detector. Flow rate was 20 ml min“ with linear gra- 
dient: 100% solvent A to 100% solvent B over 30 min, where solvent A was 20 mM 
ammonium acetate in H,O and solvent B was 20 mM ammonium acetate in MeCN- 
H20 (80:20, v/v). Compound 3d was fully characterized by ESI-HRMS (Calculated 
for Cy2H)7N4O7 m/z 329.1103, measured m/z 329.1116) and one-dimensional 
and two-dimensional NMR spectroscopy (Fig. 2b and Extended Data Fig. 2). 

5-OP-RU (3d) 'H NMR (600 MHz, DMSO-d,), 6 2.28 (3H, s), 3.38-3.43 (2H, m), 
3.47-3.51 (1H, m), 3.52-3.55 (1H, m), 3.56-3.59 (2H, m), 3.73 (1H, m), 7.43 (1H, br 
s), 8.80 (1H, s); "°C NMR (150 MHz, DMSO-dg) 6 23.5, 44.1, 63.1, 70.7, 72.8, 72.9, 
98.5, 142.0, 152.1, 157.6, 159.1, 200.2. ESI-HRMS calculated for C,,H,,N,O, [M- 
H] : 329.1103, found: 329.1116. 


Stability of 5-OP-RU in aqueous media. Purified 5-OP-RU was dissolved in 
aqueous TBS buffer (10 mM Tris, 150 mM NaCl, pH 8.0), MilliQ water (pH 6.8), 
or aqueous ammonium acetate buffer (20 mM, pH 5.4). The consumption of 5-OP- 
RU was immediately monitored by LC-MS. The initial concentration was quan- 
tified by comparing with a standard solution of known concentration. At 15 °C, the 
half-life was determined as 14.5-15h at pH 8.0 independent of the starting con- 
centrations (65-250 UM), 14.2 h at pH 6.8 (65 1.M), and 49 min at pH 5.4 (65 uM). 
At 37 °C, pH 6.8, the half-life was 135 min (Extended Data Fig. 3). 

Activation of Jurkat.MAIT and SKW.MAIT cells and detection of MR1 
expression on C1R.MRI cells. Jurkat cells transduced with genes encoding a MAIT 
TCR comprising the TRAV 1-2-TRAJ33 invariant o chain, anda TRBV6-1 chain, 
or SKW cells transduced with genes encoding the TRAV1-2-TRAJ33 invariant a 
chain with either wild-type Tyr 95 or mutated Tyr95Ala or Tyr95Phe residues, 
paired with a TRBV6-1 B chain, were tested for activation by co-incubation with 
bacterial culture supernatant or compounds and C1R antigen presenting cells 
expressing MR1 (CIR.MR1, with Jurkat.MAIT cells), or C1R cells (SKW.MAIT 
cells) for 16 h. Cells were subsequently stained with PE-Cy7-conjugated anti-CD3 
(eBioscience), and APC-conjugated anti-CD69 (BD) antibodies as well as bioti- 
nylated anti-MR1 monoclonal antibody 26.5 (ref. 21), followed by Streptavidin- 
PE (BD), before analysis by flow cytometry. Activation of Jurkat.MAIT or SKW. 
MAIT cells was measured by an increase in surface CD69 expression. MR] expres- 
sion was detected on gated CIR.MRI cells in the same assay. 

Preparation of denatured inclusion body MRI and B,M. Genes encoding sol- 
uble human MRI or human B2M were expressed for 4h in BL21 E. coli after 
induction with 1 mM isopropyl B-p-1-thiogalactopyranoside. E. coli were pelleted 
and resuspended in a buffer containing 50 mM Tris, 25% (w/v) sucrose, 1 mM 
EDTA, 10 mM dithiothreitol (DTT) pH 8.0. Inclusion body protein was then extracted 
by lysis of bacteria in a buffer containing 50 mM Tris pH 8.0, 1% (w/v) Triton 
X-100, 1% (w/v) sodium deoxycholate, 100 mM NaCl, 10 mM DTT, 5 mM MgCl, 
and 1 mg DNasel per litre of starting culture; and subsequent steps involved homog- 
enization with a polytron homogenizer, centrifugation and washing inclusion body 
protein sequentially with first a buffer containing 50 mM Tris pH 8.0, 0.5% Triton 
X-100, 100 mM NaCl, 1mM EDTA, 1 mM DTT, and second a buffer containing 
50 mM Tris pH 8.0, 1mM EDTA, 1 mM DTT. Inclusion body protein was then 
resuspended in a buffer containing 20 mM Tris pH 8.0, 8 M urea, 0.5 mM EDTA, 
1mM DTT, and following centrifugation the supernatant containing solubilized, 
denatured inclusion body protein was collected and stored at —80 °C. 

Refolding of MR1 ligand and MAIT TCR. As MRI cannot be expressed in a 
soluble form in the absence of ligand, MR1 (the ectodomain) and 2M were refolded 
with ligand essentially as described’. Briefly, in order to generate MR1-5-OP-RU 
and MR1-5-OE-RU, 56 mg of MR1 and 28 mg of B2M inclusion body proteins, 
together with 2.9 mg of 5-A-RU and 254, 204 or 204 mg of methylglyoxal, glyoxal 
(Sigma) or '*C-glycolaldehyde (Omicron Biochemicals), respectively, were added 
to a 400 ml refold solution containing 0.1M Tris, pH 8.5,2mM EDTA, 0.4M 
arginine, 0.5 mM oxidized glutathione and 5 mM reduced glutathione. Refolded 
MR1 antigen was then purified by sequential DEAE (GE Healthcare) anion exchange, 
S75 16/60 (GE Healthcare) gel filtration, and MonoQ (GE Healthcare) anion exchange 
chromatography. Alternatively, 56 mg of MRI and 28 mg of B.M inclusion body 
proteins were refolded in the presence of 400 ml of 0.45 LM filtered bacterial super- 
natants or control media, in the absence or presence of 204 mg '°C-glycolaldehyde. 
The TRBV6-1 MAIT TCR (the ectodomains) was expressed, refolded and purified 
essentially as previously described”. 

Sequences of constructs used in refolding are as follows. Soluble TRAV 1-2 a-chain 
amino acid sequence, excluding the transmembrane/cytoplasmic domains: MGQ 
NIDQPTEMTATEGAIVQINCTYQTSGFNGLFWY QQHAGEAPTFLSYNVLD 
GLEEKGRFSSFLSRSKGYSYLLLKELQMKDSASYLCAVKDSNY QLIWGAGTK 
LITIKPDIQNPDPAVY QLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKC 
VLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESS. 

Soluble TRBV6-1 B-chain amino acid sequence, excluding the transmembrane/ 
cytoplasmic domains: MNAGVTQTPKFQVLKTGQSMTLQCAQDMNHNSMY 
WYRQDPGMGLRLIY YSASEGTTDKGEVPNGYNVSRLNKREFSLRLESAAPS 
QTSVYFCASSVWTGEGSGELFFGEGSRLTVLEDLKNVFPPEVAVFEPSEAEIS 
HTQKATLVCLATGFYPDHVELSWW VNGKEVHSGVCTDPQPLKEQPALN 
DSRYALSSRLRVSATFWQNPRN HFRCQV QFYGLSENDEWTQDRAKPVTQ 
IVSAEAWGRAD. Soluble human MR1 amino acid sequence, excluding the trans- 
membrane/cytoplasmic domains: MRTHSLRYFRLGVSDPIHGVPEFISVGYVD 
SHPITTYDSVTRQKEPRAPW MAENLAPDH WERY TQLLRGW QQMFKVEL 
KRLQRHYNHSGSHTY QRMIGCELLEDGSTT GFLQ YAY DGQDFLIFNKDTL 
SWLAVDNVAHTIKQAWEANQHELLYQKN WLEEECIAWLKRFLEYGKDT 
LOQRTEPPLVRVNRKETFPG VTALFCKAHGFYPPEIY MTW MKNGEEIVQEI 
DYGDILPSGDGTY QAWASIELDPQSSNLYSCHVEHSGVHMVLQVP. 
Analysis of MR1-5-OP-RU and MR1-5-OE-RU by mass spectrometry. MR1-5- 
OP-RU or MR1-5-OE-RU (4 1g) were loaded onto an XBridge C18 reversed phase 


©2014 Macmillan Publishers Limited. All rights reserved 


column (Waters) in 20 mM ammonium acetate, pH 5.4, buffer, and detected in an 
Agilent ESI-TOF mass spectrometer after elution in an acetonitrile gradient. Data 
were collected in negative ion mode. Different instrumentation resulted in slight 
variations in retention times of the m/z 329.11, 315.09 and 317.10 species to those 
reported previously’. 

Generation of MR1(K43A), MR1-5-OP-RU and MR1-5-OE-RU tetramers. The 
generation of MR1(K43A) tetramers, loaded with synthetic rRL-6-CH,OH, has 
been previously described’. Briefly, refolded and purified empty carboxy-terminal 
cysteine-tagged-MR1(K43A) was loaded with a 136 molar excess of synthetic rRL- 
6-CH,OH for 4h at room temperature in the dark. C-terminal cysteine-tagged wild- 
type MR1-5-OP-RU and MRI1-5-OE-RU were generated as described earlier. 

Cysteine-tagged MR1(K43A)-5-OP-RU, or cysteine-tagged wild-type MR1-5- 
OP-RU or cysteine-tagged wild-type MR1-5-OE-RU were then reduced with 5 mM 
DTT for 20 min before buffer exchange into PBS using a PD-10 column (GE Health- 
care), and biotinylated with Maleimide-PEG2 biotin (Thermoscientific) with a 30:1 
molar ratio of biotin:protein at 4°C for 16h in the dark. Biotinylated MR1 was 
subjected to $200 10/300 GL (GE Healthcare) chromatography to remove excess 
biotin. Biotinylated, loaded MR1(K43A), or wild-type MR1-5-OP-RU or wild-type 
MRI1-5-OE-RU monomers were tetramerized with streptavidin conjugated to either 
PE (SA-PE) or Brilliant Violet 421 (SA-BV) (BD Pharmingen). 

Isolation of PBMCs. Whole blood from healthy donors was collected (Australian 
Red Blood Cross Service) and PBMCs were separated using Ficoll-Paque Premium 
(GE Healthcare). PBMCs were harvested and resuspended in fresh RPMI medium. 
Cells were then washed twice before resuspension in 10% DMSO in FCS. Before 
use, PBMCs were stored in liquid nitrogen. 

Tetramer staining of human PBMCs. For co-staining with wild-type and MR1 
(K43A) tetramers, approximately 5 X 10° human PBMCs were stained with MR1 
(K43A)-5-OP-RU-PE tetramer at 20 ig ml ? for 40 min at room temperature in 
the dark. Cells were then washed and stained with wild-type MR1-5-OP-RU-BV 
tetramer at 1.4.g ml — 1 CD3-AlexaFluor700 (EBioscience), CD161 -PE-Cy7 (Bio- 
legend), CD4-APC-Cy7 (Biolegend) and CD8«-PerCP-Cy5.5 (BD) for 30 min at 
4 °C. Cells were then washed once with 2 ml of FACS wash (2% fetal bovine serum 
in PBS) and resuspended in 150 pil of FACS fix (2% glucose and 1% paraformalde- 
hyde in PBS) before acquisition of data on a BD LSR-Fortessa. Data were analysed 
using FlowJo analysis software (Tree Star). 

For single staining with either MR1-5-OP-RU or MRI-5-OE-RU tetramers, human 
PBMCs were stained as described earlier with wild-type MR1-5-OP-RU-PE or 
wild-type MR1-5-OE-RU-PE tetramers at 1.4 1g ml ', and CD3-AlexaFluor700 
(EBioscience), CD161-PE-Cy7 (Biolegend), CD4-APC-Cy7 (Biolegend) and CD8«- 
PerCP-Cy5.5 (BD) for 30 min at 4 °C, before acquisition of data on a BD LSR-Fortessa. 
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Crystallization and structure determination. Crystals of the soluble MAIT TCR- 
MRI1-antigen complexes were obtained using the hanging-drop vapour diffusion 
method. The MR1-B2M-5-OP-RU, MR1-B.M-5-OE-RU, MR1, MR1(K43A)- 
B.M-5-OP-RU and MAIT TCR were concentrated to 4mg ml~ 1 mixed ina 1:1 
molar ratio, then 0.5 pl added to 0.5 pl of a precipitant solution consisting 0.1 M 
Bis-Tris propane pH 6.3, 0.2 M sodium acetate and varying concentrations of PEG 
3350 between 8-14% w/v. Crystals were observed after incubation at 20 °C for 24h 
in dark conditions and cryoprotected before diffraction experiments by soaking in 
the crystallization condition modified with between 10-15% v/v glycerol before cool- 
ing to 100 K. Diffraction images were collected at the Australian Synchrotron MX2 
beamline diffracting in a C2 space group to 2.50 A, 2.10 A and 2.20 A for the MR1- 
B2M-5-OP-RU, MR1-B2M-5-OE-RU and MR1(K43A)-B.M-5-OP-RU complexes 
with the MAIT TCR, respectively. The data were processed using Mosflm version 
7.0.9 and scaled using AIMLESS or SCALA (MR1(K43A)-B,M-5-OP-RU only) 
from the CCP4 suite’’. The phase problem was solved by molecular replacement 
using PHASER”, using MRI ternary complex (PDB accession 4L4T)'* with CDR 
loops and ligands removed and using the Reree reflection set from the model. The 
initial solution was refined in Phenix using simulated annealing refinement”, with 
all subsequent refinement steps performed using BUSTER 2.10 (ref. 26). Restraints 
for 5-OP-RU and 5-OE-RU were generated using the Grade Web Server, with model 
building performed in COOT using MolProbity for validation”. All molecular 
graphics were made with PyMOL. 
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Extended Data Figure 1 | MR1 ligand identification from different bacterial 
strains. a, Detection of m/z 329.11 species in MR1 refolded with 5-A-RU 
and methylglyoxal (control), and supernatants from wild-type (CB013) and 
CBO013-derivative (that is, ARibA, ARibB, ARibG or ARibH) L. lactis bacteria. 
Shown are counts on the y-axis versus retention time on the x-axis. b, Lack of 
activation of Jurkat.MAIT cells by supernatant from mutant ARibD/H S. 
typhimurium (strain SL1344) but not wild-type (WT), or ARibD/H plus 
RibD/H bacteria. Shown is MFI of CD69-APC on the y-axis. c, Detection of 
m/z 329.11 species in MRI refolded with supernatants from wild-type, 
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ARibD/H or ARibD/H plus RibD/H S. typhimurium bacteria, or control media. 
Shown are counts on the y-axis versus retention time on the x-axis. d, Detection 
of m/z 329.11 species in MR1 refolded with 5-A-RU and methylglyoxal 
(control), or bacterial supernatants from L. lactis (CB013) or E. faecalis bacteria, 
or control media. Shown are counts on the y-axis versus retention time on the 
x-axis. e, Detection of m/z 329.11 species in MR1 refolded with 5-A-RU and 
methylglyoxal (control), or supernatant from E. coli bacteria, or media. Shown 
are counts on the y-axis versus retention time on the x-axis. Experiments 

a-e were performed three, three, three, two and three times, respectively. 
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Extended Data Figure 2 | NMR characterization of 5-OP-RU. a-c, NMR 
characterization of 5-OP-RU (3d) in DMSO-d, with internal solvent peak at 
2.50 p.p.m. and 39.52 p.p.m. for 'H and '3C, respectively. a, 'H NMR 

(600 MHz); b, '*C NMR (150 MHz); c, HSQC. The compound 5-OP-RU 


(3d) was synthesized from the reaction of 5-A-RU and methylglyoxal in 
DMSO-d,, and then isolated from aqueous media by reversed-phase 
high-performance liquid chromatography (rpHPLC). Although it was less 
stable in water, it could still be identified and characterized at pH > 6. 
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Extended Data Figure 3 | Stability of 5-OP-RU. a, Reaction between 5-A-RU aqueous TBS buffer (10 mM Tris, 150 mM NaCl, pH 8.0), MilliQ water (pH 6.8) 
(0.5 mM) and methylglyoxal (3 equivalents (eq)) at pH 6.8, 37 °C in MilliQ or ammonium acetate buffer (20 mM, pH 5.4) at 15 °C. The half-lives were 15h 


water. b, Stability of purified 5-OP-RU (65 uM) at pH 6.8 and 37°C. The at pH 8.0, 14.2h at pH 6.8, 49 min at pH 5.4. 
half-life was 135 min. c, Stability of purified 5-OP-RU (65 |1M) at variable pH in 
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Extended Data Figure 4 | Electron density for ligands, and contacts 
associated with MR1(K43A)-5-OP-RU MAIT TCR complex. a-h, Electron 
density of 5-OP-RU in MRI, 5-OE-RU in MRI and 5-OP-RU in MR1(K43A). 
a-c, Final 2F., — F. map, contoured at 1o for 5-OP-RU (a) and 5-OE-RU (b) in 
the MAIT TCR-MRI1-antigen complex, and 5-OP-RU (c) in the MAIT 
TCR-MRI1(K43A)-antigen complex. d-f, Simulated annealing omit maps 
showing unbiased F, — F, electron density, contoured at 3a, for 5-OP-RU 
(d) and 5-OE-RU (e) in MR1, and 5-OP-RU (f) in MR1(K43A). 


g, h, MR1(K43A)-5-OP-RU MAIT TCR complex showing contacts 

between MR1(K43A) and 5-OP-RU (g) and contacts between MAIT TCR and 
5-OP-RU (h). MR1 is shown in grey, MAIT TCR CDR3z in yellow and CDR3B 
in orange with ribbon representation, and 5-OP-RU is shown in cyan with 
stick representation. Hydrogen bonds are indicated with black dashed lines 
with a water molecule mediating hydrogen bonding between the CDR3B 
5-OP-RU shown in dark blue sphere representation. 
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Extended Data Figure 5 | Chromatographic and mass spectrometry 
properties of MR1 ligands. a, Ligand eluted from MR1 complexed with the 
product of (i) 5-A-RU and methylglyoxal condensation reaction (top); (ii) 
5-A-RU and glyoxal condensation reaction (middle); or (iii) 5-A-RU and 
3C-glycolaldehyde condensation reaction (bottom). Shown are extracted ion 


chromatograms (left); m/z spectrum (centre); and product ions from targeted 
fragmentation (right). Black diamonds indicate precursor ions. This 
experiment was performed three times. b, Mass spectrometry characterization 
of 5-OP-RU (top) and 5-OE-RU and 1°C-5-OE-RU (bottom). 
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Extended Data Figure 6 | Mass spectrometry of the 315.09 species. 
Extracted ion chromatograms of m/z 315.09 species in MRI refolded with 
5-A-RU and glyoxal (control), or E. coli supernatant, or media. Shown are 
counts on the y-axis versus retention time on the x-axis. This experiment was 


performed three times. 
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Extended Data Figure 7 | NMR characterization of RL. a-c, Spectra were 


recorded as a solution in D,O-CD3OD (9:1) with internal solvent peak at b, °C NMR (150 MHz); c, HMBC. 
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Extended Data Figure 8 | NMR characterization of RL-7-Me. a-c, Spectra 

were recorded as a solution in D,O-CD3OD (9:1) with internal solvent peak at 
3.31 p.p.m. and 49.0 p.p.m. for 'H and '°C, respectively. a, 'H NMR (600 MHz) 


and mechanism for deuterium exchange of CH; at position 7. Identical 


deuterium exchange. c, HMBC. 
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Extended Data Table 1 


Data collection and refinement statistics 


MAIT-MR1-5-OP- 


MAIT-MR1-5-OE-RU 


MAIT-K43A-MRI1- 


RU 5-OP-RU 
Data collection 
Temperature 100K 100K 100K 
Space group C2 C2 C2 
Cell dimensions 
a, b,c (A) 218.76, 71.11, 144.28 218.11, 70.60, 143.86 215.58, 68.87, 142.98 
a,p,y (°) 90, 104.87, 90 90, 104.63, 90 90, 104.86, 90 
Resolution (A) 33.42-2.50 (2.55-2.50) 75.41-2.10 (2.21-2.10) 50.00-2.20 (2.3-2.20) 
bie. 9.4 (38.8) 6.1 (35.2) 5.9 (36.7) 

Voi 7.8 (2.3) 8.1 (2.1) 9.7 (2.3) 
Completeness (%) 100 (100) 98.6 (97.1) 97.9 (97.4) 
Total N° observations 307877 (19059) 462978(62837) 509054 (74702) 
N® unique observations 74555 (4584) 122109(17496) 101222 (14608) 
Multiplicity 4.1 (4.2) 3.8 (3.6) 5.0 (5.1) 
Refinement statistics 
Reactor t (%) 16.5 18.4 20.8 
Riree *(%) 21.6 22.2 24.5 
No. atoms 

¢ Protein 12424 12396 12514 

e Ligand 45 42 46 

=. “MWA 1044 900 488 
Ramachandran plot (%) 

e Most favoured 97.4 91.4 91.5 

e =©Allowed region 2.5 8.6 8.5 
B-factors (A’) 

e Protein 29.8 37.9 37.4 
rmsd bonds (A) 0.010 0.010 0.010 
rmsd angles (°) 1.16 1.05 1.08 


Values in parentheses refer to the highest-resolution bin. 

The Riactor Was Calculated from all data except for 5% that was used for the Riree calculation. 
*Roim = Zhu [1/(N — ayy? 3 Inui, 1 — <I> | 7 Lx <u> 

+ Riactor = (2 | | Fo| = | F, | | 7 | Fy | ); for all data except as indicated by ¢. 
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Caspase-11 activation requires lysis of 
pathogen-containing vacuoles by IFN-induced GTPases 


Etienne Meunier’, Mathias S. Dick'*, Roland F. Dreier’, Nura Schiirmann!, Daniela Kenzelmann Broz’, Soren Warming’, 
Merone Roose-Girma’, Dirk Bumann', Nobuhiko Kayagaki*, Kiyoshi Takeda*, Masahiro Yamamoto‘ & Petr Broz! 


Lipopolysaccharide from Gram-negative bacteria is sensed in the host 
cell cytoplasm by a non-canonical inflammasome pathway that ulti- 
mately results in caspase-11 activation and cell death’*. In mouse 
macrophages, activation of this pathway requires the production of 
type-I interferons*”, indicating that interferon-induced genes have 
a critical role in initiating this pathway. Here we report that a cluster 
of small interferon-inducible GTPases, the so-called guanylate-binding 
proteins, is required for the full activity of the non-canonical caspase-11 
inflammasome during infections with vacuolar Gram-negative bac- 
teria. We show that guanylate-binding proteins are recruited to intra- 
cellular bacterial pathogens and are necessary to induce the lysis of 
the pathogen-containing vacuole. Lysis of the vacuole releases bac- 
teria into the cytosol, thus allowing the detection of their lipopo- 
lysaccharide by a yet unknown lipopolysaccharide sensor. Moreover, 
recognition of the lysed vacuole by the danger sensor galectin-8 ini- 
tiates the uptake of bacteria into autophagosomes, which results in 
a reduction of caspase-11 activation. These results indicate that host- 
mediated lysis of pathogen-containing vacuoles is an essential immune 
function and is necessary for efficient recognition of pathogens by 
inflammasome complexes in the cytosol. 

Previous studies have reported that induction of caspase-11-dependent 
cell death by Gram-negative bacteria requires Trif-dependent production 
of type-I interferons (type-I-IFNs)** (Extended Data Fig. la). Type-I- 
IFN production is however not required for pro-caspase-11 induction**” 
and is dispensable for caspase- 11 activation by transfected lipopolysac- 
charide (LPS; Extended Data Fig. 1b)’. This indicates that interferon- 
stimulated genes (ISGs) play a major role in activating caspase-11 in 
response to intracellular bacteria. To investigate which ISGs were involved 
in activating caspase-11, we used proteomics-based expression analysis 
to identify proteins that were highly induced following Salmonella infec- 
tion. Among the most strongly upregulated proteins were interferon- 
induced GTPases, such as the large 65-67 kDa guanylate-binding proteins 
(GBPs) and small 47 kDa immunity-related GTPases (IRGs) (data not 
shown). These proteins function in cell-autonomous immunity, that is, 
mechanisms that allow host cells to kill pathogens or restrict their replication, 
and have even been associated with the activation of inflammasomes*””. 

Mice have 11 GBPs, which are highly homologous and are clustered 
in two genomic loci on chromosomes 3 and 5, respectivel oll. Recently, 
GBPs on chromosome 3 have been shown to restrict the replication of 
Toxoplasma gondii in peritoneal macrophages and mice’. We therefore 
infected bone-marrow-derived macrophages (BMDMs) from Gbp”” 
KO mice, which lack GBP1, 2, 3, 5 and 7 (Extended Data Fig. 2a—e), and 
wild-type littermates with a number of Gram-negative vacuolar pathogens 
that trigger caspase-11 activation (data not shown)’*° and determined 
the activity of the non-canonical inflammasome pathway at 16 h post- 
infection (Fig. 1a, b). Macrophages from Gbp“”” KO mice showed a sig- 
nificant reduction of cell death (as measured by lactate dehydrogenase 
(LDH) release) and IL-1 secretion when infected with wild-type Sal- 
monella typhimurium, a type three secretion system (T3SS)-deficient 


mutant of S. typhimurium (ASPI-2), Vibrio cholerae, Enterobacter clo- 
acae or Citrobacter koseri (Fig. 1a), and this was independent of LPS 
or polyinosinic:polycytidylic acid (poly(I:C)) priming (Extended Data 
Fig. 2f, g). Gbp"”*-deficiency also reduced secretion of caspase-1 p20 sub- 
unit, caspase-11 and mature IL-1, IL-18 and IL-1 (Fig. 1b). Because inter- 
ferons induce GBP expression (Extended Data Fig. 2b, c)®, we investigated 
whether IFN-7 treatment would accelerate LDH release in response to 
Salmonella infection. IFN-y-treated wild-type BMDMs released LDH 
as soon as 4 h after infection, whereas Ghp"”? KO BMDMs failed to release 
LDH at early time points even after IFN-y priming (Fig. 1c), indicating 
that GBP induction was required for activity of the non-canonical inflam- 
masome pathway. 

We next explored whether GBPs play a role in the activation of canon- 
ical inflammasomes. LPS-primed wild-type and Gbp“"”*-deficient mac- 
rophages released comparable levels of LDH and mature IL-1 when 
infected with logarithmic phase S. typhimurium, which exclusively engage 
the NLRC4 inflammasome via the SPI-1 T3SS (Fig. 1d)”. Similarly, Gbp””’- 
deficiency did not affect AIM2 inflammasome activation upon poly 
(deoxyadenylic-deoxythymidylic) acid (poly(dA:dT)) transfection (Fig. 1d). 
Although GBP5 had been previously linked to NLRP3 activation’, we 
did not observe a defect in NLRP3 activation in Gbp™”* KOs (Fig. 1d), 
possibly owing to different modes of pre-stimulation. These data indi- 
cate that GBPs are dispensable for canonical inflammasome activity, 
but are required for the activation of the non-canonical inflammasome 
pathway. 

To investigate whether GBPs directly mediated the detection of intra- 
cellular LPS, we engaged the non-canonical inflammasome by trans- 
fecting macrophages with different types of ultra-pure LPS (Fig. le). 
Cytoplasmic LPS triggered LDH release and IL-1 secretion to a sim- 
ilar extent in both wild-type and Gbp“”?-deficient BMDMs, indicating 
that GBPs were required upstream of LPS sensing and only during bac- 
terial infection. We next investigated if GBPs were required for immune 
detection of vacuolar or cytosolic bacteria by infecting BMDMs with 
AsifA S. typhimurium and Burkholderia thailandensis, which rapidly 
enter the cytosol and activate caspase-11 (ref. 13). Unprimed Gbp“"’ KO 
and wild-type BMDMs responded comparably to these bacteria (Extended 
Data Fig. 3a—c). Because GBPs might affect this response when pre- 
induced, we also infected IFN-y-primed BMDMs with AsifA S. typhi- 
murium (Extended Data Fig. 3d). IFN-y-priming indeed resulted in a 
small difference between wild-type and Gbp“””? KO BMDMsafter infec- 
tion with AsifA Salmonella, yet not to the extent seen with wild-type 
Salmonella (Fig. 1c), indicating that GBPs mainly participate in the 
activation of the non-canonical inflammasome by vacuolar bacteria. 

Finally, to investigate which GBP controls caspase-11 activation, all 
11 murine Gbps were individually knocked down in BMDMs and the 
cells were infected with flagellin-deficient Salmonella, which activate the 
non-canonical inflammasome but not NLRC4 (Extended Data Fig. 4a 
and Supplementary Information)*. Only knockdown of Gbp2 resulted 
in reduced LDH release and IL-1 secretion (Extended Data Fig. 4b-d). 


1Focal Area Infection Biology, Biozentrum, University of Basel, CH-4056 Basel, Switzerland. Department Biomedicine, University of Basel, CH-4056 Basel, Switzerland. °Genentech Inc., South San 
Francisco, California 94080, USA. “Department of Microbiology and Immunology, Osaka University, Yamadaoka, Suita, Osaka 565-0871, Japan. 
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Figure 1 | Caspase-11 activation by intracellular bacterial pathogens 
requires GBPs. a, b, LDH release, IL-1 secretion (a) and immunoblots for 
caspase-1, caspase-11, IL-1, IL-18 and IL-1o (b) from unprimed BMDMs 
infected for 16h with the indicated bacteria (grown to stationary phase). 

c, Time course measuring LDH release from unprimed or IFN-y-primed 
BMDMs infected with S. typhimurium. d, e, LDH release and IL-1 secretion 
from primed BMDMs infected with SPI-1-expressing logarithmic phase 

S. typhimurium, treated with monosodium urate, alum and nigericin or 
transfected with poly(dA:dT) and LPS. f, LDH release and IL-1 secretion from 
unprimed wild-type and Gbp2-‘~ BMDMs infected for 16 h with the indicated 
bacteria (grown to stationary phase). Graphs show mean and s.d. of 
quadruplicate wells and data are representative of two (b) and three 

(a, c-f) independent experiments. *Crossreactive band; **P < 0.01; NS, not 
significant (two-tailed t-test). 


To validate these data we obtained BMDMs from Gbp2-’~ mice and 
wild-type littermates"* and infected them with vacuolar Gram-negative 
bacteria. As expected, we observed reduced levels of cell death, cytokine 
secretion and caspase release in Gbp2-’" BMDMs, indicating attenu- 
ated activation of the non-canonical inflammasome (Fig. 1fand Extended 
Data Fig. 4e), whereas direct LPS sensing or the activation of canonical 
inflammasomes was not affected (Extended Data Fig. 4f, g). In contrast, 
Gbp5-deficiency did not have any effect on canonical and non-canonical 
inflammasome activation (Extended Data Fig. 5). Nevertheless, Gbp2- 
deficiency did not reduce caspase-11 activation as markedly as Gbp””*- 
deficiency, indicating that whereas caspase-11 activation mainly requires 
GBP2, other GBPs might also be partially involved. 
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Reduced numbers of intracellular bacteria could account for low levels 
of caspase-11 activation in Gbp””’- and Gbp2-deficient macrophages. 
However, a comparison of wild-type and Gbp“”? KO BMDMs showed 
that Gbp”?-deficiency resulted in significantly higher numbers of total 
and live Salmonella per cell (Fig. 2a), consistent with higher colony form- 
ing units numbers in Gbp”’ KO BMDMs (Extended Data Fig. 6). In 
addition, fluorescence-activated cell sorting (FACS)-based analysis of 
dead (mCherry-negative, FITC*) and live (mCherry-positive, F ITC*) 
Salmonella at 16h post-infection found significantly fewer dead bac- 
teria (~20%) in GBP“? KO and Gbp2‘~ BMDMs when compared to 
wild-type BMDMs (>30%) (Fig. 2b). Importantly, bacterial killing in 
Casp11-‘~ BMDMs was comparable to wild-type BMDMs, indicating 
that the control of bacterial replication was directly linked to GBP func- 
tion and not to the activation of the non-canonical inflammasome 
(Fig. 2b). In conclusion, we show that GBPs control bacterial replica- 
tion on a cell-autonomous level, which is consistent with a previous report 
that GBP1 partially restricts Mycobacterium bovis and Listeria mono- 
cytogenes replication”®. 

Restricting bacterial replication has been proposed to require the asso- 
ciation of GBPs with pathogen-containing vacuoles and the recruitment 
of antimicrobial factors*. We therefore investigated whether GBPs targeted 
intracellular Gram-negative bacteria. Indeed, GBP2 could be detected 
on intracellular bacteria within hours after infection (Fig. 2c). Very little 
GBP-positive bacteria were detected in Statl”’~” BMDMs, which do not 
respond to type-I- and type-II-IFNs and largely failed to induce GBP 
expression (data not shown). Remarkably, GBP-positive Salmonella 
seemed to have lost mCherry expression (Fig. 2c), indicating that these 
bacteria were dead. To determine whether GBPs are recruited to dead 
bacteria we infected BMDMs with Salmonella killed by heat, parafor- 
maldehyde or 70% ethanol treatment, yet only live Salmonella acquired 
GBP staining and activated the inflammasome (Fig. 2d). To examine this 
mechanism in vivo, we immunostained spleen tissue sections of mice 
infected with Salmonella for GBPs. Indeed, GBPs could also be found 
associated with approximately 20% of bacteria in vivo, and a signifi- 
cantly higher proportion of these bacteria were dead, based on the loss 
of mCherry expression (Fig. 2e-g). Furthermore, treatment with IFN- 
y-neutralizing antibodies reduced the percentage of GBP-positive bac- 
teria (Fig. 2f), consistent with reports that IFN-y controls Salmonella 
replication in vivo'*"*. Taken together, these results indicated that GBPs 
either kill bacteria directly or control an antimicrobial effector pathway, 
and raised the interesting possibility that GBP-mediated killing of bac- 
teria might result in the release of LPS and caspase-11 activation”’. 

To identify the antimicrobial effector pathway that is controlled by 
GBPs we first examined the role of free radicals*. Although GBP7 was 
reported to be required for reactive oxygen species (ROS) production 
and to interact with the phagosome oxidase complex", we did not find 
any role for ROS or NO production in caspase-11 activation (Extended 
Data Fig. 7). Furthermore, GBPs were also proposed to recruit com- 
ponents of the autophagy machinery to pathogen-containing vacuoles 
(PCVs), possibly resulting in bacterial killing within autophagosomes*"®. 
Indeed, many GBP-positive S. typhimurium, E. cloacae and C. koseri 
co-stained for the commonly used autophagy marker LC3 (Fig. 3a and 
Extended Data Fig. 8a). Recruitment of LC3 to intracellular Salmonella 
was partially GBP-dependent, because we found significantly lower num- 
bers of LC3-positive Salmonella in Gbp””’ KO compared to wild-type 
macrophages (Fig. 3b, c). Therefore, we speculated that autophagy- 
mediated killing might result in the release of LPS from bacteria and 
caspase-11 activation. Unexpectedly, however, pharmacological inhi- 
bition of autophagy with 3-methyladenine (3-MA) resulted in signifi- 
cantly higher levels of LDH release, IL-1 secretion and caspase-1/ 
caspase-11 activation in macrophages infected with S. typhimurium, 
E. cloacae or C. koseri (Fig. 3d, e), indicating increased activation of the 
non-canonical inflammasome. Consistently, cell death was still caspase- 
11-dependent because Casp11~’~ BMDMs did not release LDH when 
treated with 3-MA and infected with Gram-negative bacteria (Fig. 3f). 
Direct activation of caspase-11 by LPS transfection was independent of 
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Figure 2 | GBPs control bacterial replication. a, b, Quantification of live 
(mCherry-positive) and dead (mCherry-negative) S. typhimurium per cell by 
immunofluorescence (a) or as percent of total by flow-cytometry (b) in 
unprimed BMDMs at 16h post-infection. c, Immunostaining for GBP2 and 
quantification of live and dead Salmonella at 4h post-infection. Arrowheads, 
bacteria shown in insets. d, Quantification of GBP-positive bacteria, LDH 
release and IL-1 secretion at indicated time points from BMDMs infected with 
Salmonella, live or killed by different means. e, Immunohistochemistry for 


autophagy (Fig. 3g), indicating that autophagy only counteracts non- 
canonical inflammasome activation during bacterial infections. To fur- 
ther confirm our data, we infected Atgs “" BMDMswith S. typhimurium 
and we also observed significantly higher levels of non-canonical inflam- 
masome activation compared to wild-type BMDMs (Fig. 3h, i). Taken 
together, these results indicated that, although GBPs promoted the 
uptake of bacteria into autophagosomes, autophagy actually counter- 
acted caspase-11 activation. Thus, GBP-dependent LPS detection occurs 
before bacteria are targeted to autophagosomes. 

A possible explanation could be that autophagy sequesters bacteria 
that had escaped from the vacuole, and thus prevents further LPS release 
into the cytosol. Recently, the cytosolic danger receptor galectin-8 was 
reported to function as a marker for lysed vacuoles. Galectin-8 binds 
B-galactosides, which are normally found on the inner leaflet of the vac- 
uolar membrane and get exposed to the cytosol upon vacuolar lysis’’. 
Indeed, quantification of galectin-8-positive Salmonella showed that 
significantly fewer bacteria were targeted by galectin-8 in Gbp”” KO 
BMDMs than in wild-type macrophages (Fig. 4a). Because galectin-8 
colocalized with GBP- and LC3-positive Salmonella (Fig. 4b, c), we spec- 
ulated that GBPs promote LC3 recruitment through galectin-8. Con- 
sistently, we found lower levels of galectin-8-positive Salmonella among 
LC3-positive Salmonella in Gbp“”’ KO compared to wild-type BMDMs 
(Fig. 4d). Galectin-8 interacts with the autophagy adaptor protein NDP52, 
which in humans contains binding sites for galectin-8, ubiquitin and 
LC3"*. In line with a role for NDP52 in linking galectin-8 to LC3, murine 
NDP52 colocalized with galectin-8 on intracellular Salmonella (Extended 
Data Fig. 8b). Targeting of Salmonella to autophagosomes might also 
involve other autophagy cargo adaptors, because p62 was associated with 
the majority of LC3-positive bacteria, yet this was independent of GBPs 
(Extended Data Fig. 8c, d). Altogether, these results suggested that GBPs 
might promote the lysis of vacuoles or help to recruit galectin-8 to lysed 
vacuoles. 
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GBP2 and Salmonella on spleen tissue from Salmonella (mCherry-positive)- 
infected mice (representative of n = 3 per group). S. tm., S. typhimurium. 

f, g, Quantification of GBP-positive Salmonella in anti-IFN-y-treated or control 
animals (f) and live and dead bacteria among GBP2-negative/-positive 
Salmonella (g) (n = 3 per group). Scale bars, 10 1m (c), 1 pm (e). Graphs show 
mean and 5-95 percentile (box plots) or s.d. of technical triplicates, and data are 
representative of three independent experiments. *P < 0.05, **P < 0.01 
(two-tailed t-test). 


To confirm a direct role of GBPs in vacuolar lysis, we adapted a pha- 
gosome integrity assay based on differential permeabilization with dig- 
itonin (Extended Data Fig. 9). Comparing wild-type and GBP” KO 
BMDMs, we found significantly lower numbers of cytosolic (FITC*) 
S. typhimurium in Gbp”?-deficient cells (Fig. 4e, f). Similarly, Gbp2-/~ 
BMDMs also harboured fewer cytosolic S. typhimurium compared to 
BMDMs from wild-type littermates (Fig. 4g). In contrast, we did not 
find a defect in cytosolic localization between wild-type and Gbp””’ KO 
BMDMs infected with the specialized cytosolic pathogen Shigella flex- 
neri, which uses its T3SS to destabilize the phagosome and escape into 
the cytoplasm (Fig. 4h)’. Although we cannot exclude that GBPs might 
also be involved in the recruitment or assembly of the non-canonical 
inflammasome, these results indicate that GBPs, in particular GBP2, 
directly promote the destruction of vacuoles. 

In conclusion, our data demonstrate that host-induced destruction 
of PCVs or phagosomes is an essential immune function and assures 
recognition of vacuolar bacteria by cytosolic innate immune sensors 
(Extended Data Fig. 10). Additional studies are required to determine 
how GBPs distinguish ‘self and ‘non-self membranes and by which mech- 
anism phagosomes are lysed. In mice, this might involve the IRGM 
proteins that can act as GDI (guanine nucleotide dissociation inhib- 
itor) and inhibit IRG and GBP activity. Absence of IRGMs results in 
mislocalization of both IRGs and GBPs and even in degradation of lipid 
droplets’ *’, supporting a model in which IRGM proteins would pro- 
tect ‘self’-vacuoles from being targeted by host IRGs and GBPs”. Because 
both commensals and pathogens activate caspase-11 (ref. 1), it can be 
assumed that GBPs are not specific towards pathogens but are a general 
innate immune response against bacteria trapped in the phagosomes of 
macrophages. Finally, given the important role of LPS-induced caspase-11 
activation in septic shock’’, pharmaceutical targeting of the above- 
described pathways might be used to modulate inflammation during 
bacterial sepsis. 
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Figure 3 | Autophagy reduces caspase-11 
activation. a, b, Unprimed BMDMs infected with 
S. typhimurium for 4h and immunostained for 
LC3 and GBP2. Arrowheads, bacteria shown in 
insets. Scale bars, 10 jtm. c, Quantification of results 
from b. d-g, LDH release and immunoblots for 
caspase-1 and caspase-11 from BMDMs infected 
for 16h or transfected with LPS in presence or 
absence of 3-methyladenine (3-MA). h, i, LDH 
release and IL-1 secretion from BMDMs infected 
for 16h or transfected with LPS. Graphs show 
mean and s.d. of quadruplicate wells and data 

are representative of two (e, i) and three 

(a-d, f-h) independent experiments. 

*P<0.05, **P < 0.01; NS, not significant 
(two-tailed t-test). 


Figure 4 | GBP-mediated lysis of the PCV 
releases Salmonella into the cytosol. 

a, Quantification of galectin-8-positive Salmonella 
in unprimed BMDMs at 4h post-infection. 

b, c, Unprimed BMDMs infected with 

S. typhimurium for 4h and immunostained for 
galectin-8, GBP2 and LC3. Arrowheads, 

bacteria shown in insets. Scale bars, 10 lum. 

d, Quantification of galectin-8/LC3-double- 
positive Salmonella at indicated time points 
post-infection. e-h, Quantification of cytosolic and 
vacuolar bacteria by flow cytometry in BMDMs 
infected with mCherry-positive S. typhimurium 
(e-g) or S. flexneri (h, wild-type or AT3SS) for 4h. 
Graphs show mean and s.d. or 5-95 percentile 
(Box plots) of technical triplicates. Data are 
representative of 2 (g, h), 3 (a-d) and 4 

(e, f) independent experiments. *P < 0.05, 

**P <).01 (two-tailed t-test). 
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METHODS SUMMARY 

BMDMs were cultured and seeded for infections as described previously*. Priming 
was done overnight with PAM3CSK4 (1 pg ml 1), LPS O111:B4 (0.1 Lg ml~ 1), murine 
IFN-B or murine IFN-y (1 unit per ul). S. typhimurium, S. flexneri, V. cholerae, 
E. cloacae, C. koseri and B. thailandensis were grown overnight in LB or TSB medium at 
37 °C with aeration. Bacteria were diluted in fresh pre-warmed macrophage medium 
and added to the macrophages at a multiplicity of infection (m.o.i.) of 100:1 for 
measurements of caspase-11 and caspase-1 activity or 10:1 for all other assays. For 
assaying NLRC4 activation, Salmonella were subcultured for 4h to induce SPI-1 
T3SS expression before infection (m.o.i. 20:1). S. flexneri were subcultured for 3h 
to induce T3SS expression before infection (m.o.i. 30:1). When required, apocynin, 
L-NG-nitroarginine methyl ester (L-NAME), 3-methyladenine or vehicle controls 
were added 30 min before infection. Plates were centrifuged for 15 min at 500g to 
synchronize the infection and placed at 37 °C for 1 h. Next, 100 jig ml‘ gentamycin 
was added to kill extracellular bacteria. After 1 h incubation, the cells were washed 
once with DMEM and given fresh macrophage medium containing 10 pg ml * 
gentamicin for the remainder of the infection. Transfection with poly(dA:dT) or 
MSU, alum or nigericin treatment was done as described previously’ or as indi- 
cated. All animal experiments were approved and performed according to local guide- 
lines. Female BALB/c mice (10-14 weeks old) were infected intravenously with 
Salmonella (1,000 c.f.u.) and euthanized 4-5 days later. For antibody injections, 
mice received on day 3 two intraperitoneal injections of 200 .l PBS containing 
0.2 mg anti-IFN-y monoclonal or 0.2 mg rat IgG1, « isotype control antibody. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Bacterial strains and plasmids. Salmonella enterica serovar Typhimurium 
(S. typhimurium) SL1344 and congenic mutants were published before’. Other 
bacterial strains used were Shigella flexneri, Vibrio cholerae, Enterobacter cloacae, 
Citrobacter koseri and Burkholderia thailandensis ATCC700388. 

Mice. Gbp"”? KO, Gbp2~’”, Atgs"-Lyz-Cre, Cybb~/~ (gp91?P), Casp1~/~/ 
Casp11~/~ (aka caspase-1 knockout), Casp11~/~ and Casp1~/~ (Casp1~’/Casp11®) 
mice have been previously described"'!!''*7*°, Mice were bred in the animal facil- 
ities of the University of Basel, Genentech Inc., Heinrich-Heine-University Duesseldorf 
or the University of Osaka. Generation of mice with Gbp5 KO alleles by zinc finger 
nuclease (ZFN) technology: A ZEN pair was obtained from Sigma-Aldrich (SAGE 
Labs). The ZEN pair recognizes a sequence in mouse Gbp5 exon 2 (cut site is 
underlined): 5’-TGCCATCACACAGCCAGTGGTGGTGGTAGCCATTGTGG 
GT-3'. ZEN mRNA and a donor plasmid harbouring a 10-bp deletion in Gbp5 
exon 2 was co-microinjected into C57BL/6N zygotes using established procedures. 
One male founder carrying the 10-bp deletion was obtained by homologous 
recombination (10-bp deletion is underlined): 5’-TGCCATCACACAGCCAGTG 
GIGGTGGTAGCCATIGTGGGT-3. This founder was bred with C57BL/6N females 
to generate heterozygous progeny for subsequent intercrossing. Two founders (a male 
and a female) carrying identical 1-bp deletions were obtained by non-homologous 
end-joining (deleted bp is underlined): 5’-TGCCATCACACAGCCAGTGGTGG 
TGGTAGCCATTGTGGGT-3. These two founders were intercrossed to directly 
generate homozygous progeny. Both the 10-bp (designated KO line 1) and 1 bp 
(designated KO line 2) deletions lead to frameshifts and premature stop codons in 
Gbp5 exon 2. 

Animal infection. All animal experiments were approved (license 2239, Kantonales 
Veterinaramt Basel-Stadt) and performed according to local guidelines (Tierschutz- 
Verordnung, Basel-Stadt) and the Swiss animal protection law (Tierschutz-Gesetz). 
Female BALB/c mice (10-14 weeks old) were infected intravenously with mCherry- 
positive Salmonella (1,000 c.f.u.) and euthanized 4-5 days later. For antibody injec- 
tions, mice (1 = 3 per group) received on day 3 two intraperitoneal injections of 
200 pl PBS containing 0.2 mg anti-IFN-y monoclonal antibody (Clone XMG1.2, 
BioLegend) or 0.2 mg rat IgG1, k isotype control antibody (clone RTK2071, Bio- 
Legend). No randomization or blinding was performed. 

Cell culture and infections. BMDMs were differentiated in DMEM (Invitrogen) 
with 10% v/v FCS (Thermo Fisher Scientific), 10% MCSF (L929 cell supernatant), 
10 mM HEPES (Invitrogen), and nonessential amino acids (Invitrogen). 1 day before 
infection, macrophages were seeded into 6-, 24-, or 96-well plates at a density 
of 1.25 X 10°, 2.5 X 10°, or 5 X 10* per well. If required macrophages were pre- 
stimulated with PAM3CSK4, LPS O111:B4 (InvivoGen), mIFN-B or mIFN-y 
(eBioscience). For infections with S. typhimurium, V. cholerae, E. cloacae, C. koseri 
and B. thailandensis, bacteria were grown overnight in LB or TSB at 37 °C with 
aeration. The bacteria were diluted in fresh pre-warmed macrophage medium and 
added to the macrophages at an multiplicity of infection (m.o.i.) of 100:1 for mea- 
surements of caspase-11 and caspase-1 activity or 10:1 for all other assays. For 
assaying Salmonella-induced NLRC4 activation, Salmonella were subcultured for 
4h before infection to induce SPI-1 T3SS and flagellin expression. S. flexneri were 
cultured overnight in TSB medium and subcultured for 3h before infection to 
induce T3SS expression. IFN-y-primed BMDMs (to induce GBP expression) were 
infected with m.o.i. of 30:1 with S. flexneri for FACS analysis. When required, che- 
mical reagents, Apocynin (Sigma Aldrich, 1001.M), L-NG-nitroarginine methyl 
ester (L-NAME; Sigma Aldrich, 100 1M) and 3-methyladenine (Sigma Aldrich, 
5 mM) were added 30 min before infection. The plates were centrifuged for 15 min 
at 500 g to ensure comparable adhesion of the bacteria to the cells and placed at 
37°C for 60 min. Next, 100 pgml* gentamycin (Invitrogen) was added to kill 
extracellular bacteria. After a 60-min incubation, the cells were washed once with 
DMEM and given fresh macrophage medium containing 10 pg ml’ gentamicin 
for the remainder of the infection. For infections with killed bacteria, Salmonella 
were grown as above. Shortly before the infection, bacteria were left untreated or 
incubated for 30 min at 95 °C, in 4% paraformaldehyde or in 70% ethanol. Fol- 
lowing the treatment, bacteria were washed with PBS and prepared for infections 
as outlined above. The effectiveness of the killing procedures was verified by plat- 
ing serial dilutions. Transfection with poly(dA:dT) or treatment with MSU, alum 
or nigericin was done as described previously’ or as indicated. 

siRNA knockdown. Gene knockdown was done using GenMute (SignaGen) and 
siRNA pools (siGenome, Dharmacon). Briefly, wild-type BMDMs were seeded into 
24-, or 96-well plates at a density of 1.5 X 10° or 3 X 10° per well. siRNA complexes 
were prepared at 25nM siRNA in 1X GenMute Buffer according to the manufac- 
turer’s instructions for forward knockdowns. siRNA complexes were mixed with 
BMDM medium and added onto the cells. BMDMs were infected with S. typhi- 
murium at an m.o.i. of 100:1 after 56 h of knockdown and analysed for inflamma- 
some activation as outlined below. siRNA pools included: Casp11 (that is, Casp4) 
(M-042432-01), Gbp1 (M-040198-01), Gbp2 (M-040199-00), Gbp3 (M-063076-01), 


LETTER 


Gbp4 (M-047506-01), Gbp5 (M-054703-01), Gbp6 (M-041286-01), Gbp7 (M- 
061204-01), Gbp8 (M-059726-01), Gbp9 (M-052281-01), Gbp10 (M-073912-00), 
Gbp11 (M-079932-00) and NT (non-targeting) pool 2 (D-001206-14). See Supplemen- 
tary information for sequences. 

LPS transfection. Macrophages were seeded as described above. Cells were pre- 
stimulated with 10 ug ml’ of PAM3CSK4 for 4h in Opti-MEM and transfected 
for 16h with ultrapure LPS E. coli O111:B4, ultrapure LPS E. coli K12 or ultrapure 
LPS Salmonella minnesota (InvivoGen) in complex with FuGeneHD (Promega) as 
described previously’. 

Cytokine and LDH release measurement. IL-1{ and tumour necrosis factor (TNF)-o 
was measured by ELISA (eBioscience). LDH was measured using LDH Cytotox- 
icity Detection Kit (Clontech). To normalize for spontaneous lysis, the percentage 
of LDH release was calculated as follows: (LDH infected — LDH uninfected)/(LDH 
total lysis - LDH uninfected)*100. 

Western blotting. Western blotting was done as described before’. Antibodies used 
were rat anti-mouse caspase-1 antibody (1:1,000; 4B4; Genentech), rat anti-mouse 
caspase-11 (1:500; 17D9; Sigma), rabbit anti-IL-1o (1:1,000; ab109555; Abcam), 
rabbit anti-IL-18 (1:500; 5180R; Biovision), goat anti-mouse IL-1 antibody (1:500; 
AF-401-NA; R&D Systems) and rabbit anti-GBP2 and rabbit anti-GBP5 (1:1,000; 
11854-1-AP/13220-1-AP; Proteintech). Cell lysates were probed with anti-B-actin 
antibody (Sigma) at 1:2,000. 

Statistical analysis. Statistical data analysis was done using Prism 5.0a (GraphPad 
Software, Inc.). To evaluate the differences between two groups (cell death, cyto- 
kine release, FACS, CFU and immunofluorescence-based counts) the two-tailed 
t-test was used. In figures NS indicates ‘not significant’, P values are given in figure 
legends. 

Immunofluorescence. Macrophages were seeded on glass coverslips and infected 
as described above. At the desired time points cells were washed 3X with PBS and 
fixed with 4% paraformaldehyde for 15 min at 37 °C. Following fixation coverslips 
were washed and the fixative was quenched with 0.1 M glycine for 10 min at room 
temperature. Coverslips were stained with primary antibodies at 4 °C for 16 h, washed 
4x with PBS, incubated for 1h with appropriate secondary antibodies at room 
temperature (1:500, AlexaFluor, Invitrogen), washed 4X with PBS and mounted 
on glass slides with Vectashield containing 4’,6-diamidino-2-phenylindole (DAPI) 
(Vector Labs). Antibodies used were rabbit anti-LC3 (1:1,000; NB600- 1384, Novus), 
mouse anti-LC3 (1:100, 2G6, NanoTools), guinea-pig anti-p62 (1:100, GP62-C, 
Progen), goat anti-Salmonella (1:500, CSA-1 and CSA-1-FITC, KPL), mouse anti- 
galectin-8 (1:1,000, G5671, Sigma), goat anti-galectin-8 (1:100, AF1305, R&D), 
rabbit anti-Optineurin (1:100, ab23666, Abcam), rabbit anti- NDP52 (1:100, D01, 
Abnova), anti-PDI (1:100, ADI-SPA-890, Enzo Lifesciences), anti-Calnexin (1:100, 
ADI-SPA-860-D, Enzo Lifesciences), goat anti-GBP1-5 (1:100, sc-166960, Santa 
Cruz Biotech), rabbit anti-GBP2 and rabbit anti-GBP5 (1:100; 11854-1-AP/13220- 
1-AP; Proteintech). Coverslips were imaged on a Zeiss LSM700 or a Leica SP8 at X63 
magnification. Colocalization studies were performed as blinded experiments, with in 
general a minimum count of 100 bacteria per coverslip and performed in triplicate. 
Immunofluorescence based counts of live (mCherry* /FITC*) and dead (mCherry / 
FITC”) bacteria were done as blinded experiment on z stacks taken from 15 ran- 
dom fields in three biological replicates, with a total of approximately 10,000 bac- 
teria counted. 

Immunohistochemistry. Cryosections were blocked in 1% blocking reagent (Invi- 
trogen) and 2% mouse serum (Invitrogen) in TBST (0.05% Tween in 1X TBS pH 7.4), 
and stained with primary and secondary antibodies (goat anti-CSA1; 1:500; 01- 
91-99-MG; KPL and anti-GBP2; 1:100; 11854-1-AP; Proteintech). Secondary 
antibodies included Santa Cruz Biotech sc-362245 and Molecular Probes A21206, 
A21445 and A21469. 

ROS assay. Measurement of oxygen-dependent respiratory burst of BMDMs was 
performed by chemiluminescence in the presence of 5-amino-2,3-dihydro-1,4- 
phtalazinedione (luminol, Sigma Aldrich, 66 1M) using a thermostatically (37 °C) 
controlled luminometer. Both oxygen and nitrogen species were detected (O2 , 
ONOO, OH’). Chemiluminescence generation was monitored every minute for 
1 hafter IFN-y (100 U ml’) and/or Salmonella challenge and expressed as counts 
per minute. 

NO assay. Nitrite production was measured by the Griess assay as previously 
described”*. Briefly, in 96-well plates, BMDMs were infected as described above 
in presence or absence of IFN-y or IL-1 for 16 h. Supernatants were mixed 1:1 with 
2.5% phosphoric acid solution containing 1% sulfanilamide and 0.1% naphthyle- 
nediamine. After 30 min incubation at room temperature, the nitrite concentration 
was determined by measuring absorbance at 550 nm. Sodium nitrite (Sigma) was 
used as a standard to determine nitrite concentrations in the cell-free medium. 
Digitonin assay. For flow-cytometry-based quantification of cytoplasmic and 
vacuolar bacteria, macrophages were infected with mCherry* S. typhimurium or 
mCherry" S. flexneri as described above. At the desired time point, cells were washed 
3X with KHM buffer (110 mM potassium acetate, 20 mM HEPES, 2mM MgCh, 
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pH7.3) and incubated for 1 min in KHM buffer with 150 pg ml’ digitonin (Sigma). 
Cells were immediately washed 2 with KHM buffer and then stained for 12 min 
with anti-Salmonella-FITC (1:500, CSA-1, KPL) or anti-Shigella (1:100, BP1064, 
Acris) in KHM buffer with 2% BSA. Secondary antibodies used for S. flexneri stain- 
ing were: anti-Rabbit-488 (1:500, Invitrogen). Cells were washed 3 with PBS and 
lysed in PBS with 0.1% Triton-X (Sigma) and analysed on a FACS-Canto-II. Con- 
trols were included in every assay and are described in (Extended Data Fig. 9). 

Live/dead analysis by FACS. Infection of macrophages was performed using 
mCherry™ bacteria as described above. At 16h post-infection cells were washed 
and lysed with PBS solution containing 0.1% Triton X-100 (Sigma Aldrich) to release 
intracellular bacteria. Salmonella were counterstained using an anti-Salmonella anti- 
body (CSA-1, KPL) and analysed using a FACS Canto-II for fluorescence intensities 


in FL-1 and FL-2 channels. Data were analysed with FlowJo 10.0.6 software. The 
gate was set for the bacterial population based on the FSC/SSC and the anti- 
Salmonella staining (CSA-1-FITC, KPL). Controls included live mCherry-expressing 
and mCherry-negative Salmonella stained with anti-Salmonella antibodies (CSA-1, 
KPL). 
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Extended Data Figure 1 | Type-I-interferon signalling is required to induce 
caspase-11-dependent cell death in response to bacterial infection, but 

not in response to LPS transfection. a, LDH release from unprimed 
BMDMs infected for 16h with wild-type (WT) S. typhimurium or ASPI-2 

S. typhimurium grown to stationary phase. b, LDH release from primed 
BMDMs transfected with LPS O111:B4. Graphs show the mean and s.d. of 
quadruplicate wells and are representative of three independent experiments. 
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Extended Data Figure 2 | BMDMs from Gbp“? KO mice have normal 
responses to priming stimuli, but fail to activate the non-canonical 
inflammasome during bacterial infections. a, Schematic representation of 
the GBP locus on murine chromosome 3. The extent of the deletion in Gbp”? 
KO mice is indicated. b-d, Induction of pro-caspase-11, GBP2 and GBP5 
expression in lysates of wild-type and Gbp”? KO BMDMs stimulated for 16h 
with the indicated amounts of murine IFN-B, murine IFN-y or LPS O111:B4. 
e, TNF-« release from BMDMs stimulated for 16 h with LPS O111:B4. f, g, LDH 


poly(I:C) 


release and IL-1 secretion from wild-type and Gp“? KO BMDMs infected 
for 16h with wild-type (WT) S. typhimurium, ASPI-2 S. typhimurium, 

V. cholerae, E. cloacae or C. koseri grown to stationary phase. Cells were primed 
overnight with LPS (f) or poly(I:C) (g). “Indicates background band. Graphs 
show the mean and s.d. of quadruplicate wells and data are representative of 
two independent experiments. **P < 0.01, NS, not significant (two-tailed 
t-test). 
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B. thailandensis grown to stationary phase. d, LDH release and IL-1 secretion _ significant (two-tailed t-test). 
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inflammasome activation during Salmonella infection, but is dispensable _ phase. f, g, LDH release and IL-1 secretion from primed wild-type and 

for direct LPS sensing and canonical inflammasomes. a, Schematic drawing | Gbp2-’” BMDMs transfected with the indicated types of LPS for 16h, treated 
of the inflammasome pathways activated by flagellin-deficient Salmonella. with nigericin for 1h, infected with SPI-1 T3SS expressing logarithmic phase 
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Extended Data Figure 5 | Normal activation of non-canonical and with the indicated LPS for 16h (c) infected for 1 h with SPI-1 T3SS expressing 
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infected for 16 h with wild-type (WT) S. typhimurium, ASPI-2 S. typhimurium, quadruplicate wells and data are representative of three independent 
V. cholerae, E. cloacae or C. koseri grown to stationary phase (b), transfected experiments. 
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experiments. 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


+ : S. typhimurium wt 
f wr C2 e g ead 
< : F DMSO 
CC @bp*" ko ro &S aS ; & . 
Ss 2: ee ee — § CH = | cases p20 
10000 o o m ” 
NS 604 —— 6000 T1Casp-1/Casp-11 5 1 CASP11 
8 38000. il apatre a = -_ 
So NS cy = a 
3 6000 £8 40 E 4000 
A ae 3 | _——— )- Procaser 
52 4000 eee 8 
a3 Ee = 00 £ [= ==> ] Pro-CASP11 
£ 2000 OR = = 
0 ~ 6 0 8 [= i 
d e y e 
as we ef 
S S) 
s e xe e a Le < 
és ” ss ral ¢ ae 
é ¢ So . 
“+ DMSO 
+ + DMSO + + + DMSO 8000 + + + + Apocynin 
+ + + = Apocynin + + +  Apocynin 
30: 
250: 
g = zg S000 wr 
qs 20 ed — gE eile 
of 2 150 e¢2 4000 
== — 32 
85 40 = 100 -8 
& = 50 % 2000 
0 0 
0 
we we 
f + + &. typhimurium = 9 h e ge” 
+ Lae RY ge ee 
fo. es : 2 
- RY sf ge of 
& + + DMSO 
i= ——— 
o + + + DMSO 
a [= = }: CASP11 + + DMSO 
a 307 + + + L-NAME 300: is * ee 
cy 
- Pro- £8 > 
5 — ProCASP1 — § 8 20 E 200 
oO 3 Fat 10 oO 
3 7 od 7 


0: 


Extended Data Figure 7 | Inhibition of ROS and NO production does not 
affect non-canonical inflammasome activation. a, b, ROS levels, LDH release 
and IL-1 secretion in unprimed BMDMs left uninfected or infected for 16h 
with wild-type S. typhimurium grown to stationary phase. c-e, LDH release, 
IL-1 secretion, ROS levels and immunoblots for processed caspase-1 and 
caspase-11 released from unprimed BMDMs infected for 16 h with wild-type 
(WT) S. typhimurium or E. cloacae grown to stationary phase in the presence of 
the ROS inhibitor (apocynin) or a vehicle control (DMSO). f, g, LDH 

release, IL-1 secretion and immunoblots for processed caspase-1 and 


0 


caspase-11 released from unprimed BMDMs infected for 16h with wild-type 
S. typhimurium or E. cloacae grown to stationary phase in the presence of the 
iNOS inhibitor (L-NAME) or a vehicle control (DMSO). h, NO release from 
unprimed or IFN-y-primed BMDMs infected for 16h with S. typhimurium in 
presence of the iNOS inhibitor (L-NAME) or a vehicle control (DMSO). Ext, 
extract; SN, supernatant. Graphs show the mean and s.d. of quadruplicate wells 
and data are representative of two (a-c, e-g) and three (d, h) independent 
experiments. NS, not significant (two-tailed t-test). 
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Extended Data Figure 8 | Colocalization of GBPs and autophagy proteins 
on intracellular bacteria. a, Colocalization of LC3 with GBPs in unprimed 
wild-type BMDMs infected with E. cloacae or C. koseri for 4h and stained 
for LC3, GBP2 and DNA. b, Colocalization of galectin-8 and NDP52 in 
unprimed wild-type BMDMs infected with wild-type S. typhimurium for 4h 
and stained for galectin-8, NDP52 and DNA. ¢, Colocalization of p62 and LC3 
in unprimed wild-type BMDMs infected with wild-type S. typhimurium for 
4h and stained for LC3, p62 and DNA. d, Quantification of p62 and LC3 
co-staining in wild-type and Gbp“"”? KO BMDMs at 4h post-infection with 
Salmonella. Arrowheads indicate region shown in insets. Scale bars, 1 jim (a) 
and 10 um (b,c). Graph shows the mean and s.d. of triplicate counts and images 
and graph are representative of at least two independent experiments. NS, not 
significant (two-tailed f-test). 
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Extended Data Figure 9 | Digitonin-based quantification of cytoplasmic 
bacteria. a, Immunostaining for calnexin and PDI (protein disulphide 
isomerase) in wild-type BMDMs left untreated or permeabilized with digitonin 
or saponin. b, Differentially permeabilized macrophages stained for cytosolic 
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Extended Data Figure 10 | Model for the role of GBPs and autophagy in 
caspase-11 activation. The pathogen-containing vacuole of vacuolar bacterial 
pathogens is recognized by interferon-induced GBPs in an unknown manner. 
GBPs promote the lysis of the PCV either directly or indirectly, resulting in 
the release of the bacteria into the cytosol and activation of caspase-11 by 
bacterial LPS. f-galactosides of the lysed vacuole serve as danger signals upon 


autophagosome 


exposure to the cytosol and are recognized by galectin-8 leading to the 
recruitment of the autophagy machinery. p62 participates in this process by 
recognizing ubiquitin-chains on the vacuole or the bacterium. Uptake of the 
bacterium and the lysed vacuole into autophagosomes reduces caspase-11 
activation by removing the source of LPS from the cytosol. 
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Reconstructing lineage hierarchies of the distal lung 
epithelium using single-cell RNA-seq 


Barbara Treutlein'*, Doug G. Brownfield, Angela R. Wul!, Norma F. Neff', Gary L. Mantalas!, F. Hernan Espinoza’, 


Tushar J. Desai®, Mark A. Krasnow’ & Stephen R. Quake! 


The mammalian lung is a highly branched network in which the 
distal regions of the bronchial tree transform during development 
into a densely packed honeycomb of alveolar air sacs that mediate 
gas exchange. Although this transformation has been studied by 
marker expression analysis and fate-mapping, the mechanisms that 
control the progression of lung progenitors along distinct lineages 
into mature alveolar cell types are still incompletely known, in part 
because of the limited number of lineage markers'* and the effects 
of ensemble averaging in conventional transcriptome analysis exper- 
iments on cell populations’ *. Here we show that single-cell tran- 
scriptome analysis circumvents these problems and enables direct 
measurement of the various cell types and hierarchies in the devel- 
oping lung. We used microfluidic single-cell RNA sequencing (RNA- 
seq) on 198 individual cells at four different stages encompassing 
alveolar differentiation to measure the transcriptional states which 
define the developmental and cellular hierarchy of the distal mouse 
lung epithelium. We empirically classified cells into distinct groups 
by using an unbiased genome-wide approach that did not require a 
priori knowledge of the underlying cell types or the previous puri- 
fication of cell populations. The results confirmed the basic outlines 
of the classical model of epithelial cell-type diversity in the distal 
lung and led to the discovery of many previously unknown cell-type 
markers, including transcriptional regulators that discriminate bet- 
ween the different populations. We reconstructed the molecular steps 
during maturation of bipotential progenitors along both alveolar 
lineages and elucidated the full life cycle of the alveolar type 2 cell 


lineage. This single-cell genomics approach is applicable to any devel- 
oping or mature tissue to robustly delineate molecularly distinct cell 
types, define progenitors and lineage hierarchies, and identify lineage- 
specific regulatory factors. 

In mice, alveolar epithelial cells differentiate between embryonic days 
(E)16.5 and 18.5: distal airway tips expand into sac-like configurations 
(‘sacculation’) as a morphologically uniform population of columnar pro- 
genitors proceeds towards the fate of either flat alveolar type 1 (AT 1) cells 
specialized for gas exchange or surfactant-secreting cuboidal alveolar 
type 2 (AT2) cells (Extended Data Fig. 1). At each time point during 
sacculation, progenitors, intermediates and recently differentiated cells 
coexist (Fig. 1a)°. To resolve the cellular composition of the developing 
bronchio-alveolar epithelium, we initially sequenced transcriptomes of 
80 individual live cells of the developing mouse lung epithelium late in 
sacculation (E18.5; three biological replicates). Single-cell suspensions 
of micro-dissected distal lung regions were purified by magnetic-activated 
cell sorting (MACS) to deplete leukocytes and alveolar macrophages 
and enrich for epithelial cells (CD45 / EpCAM") (Extended Data Fig. 2). 
An automated microfluidic platform was used to capture and lyse indi- 
vidual epithelial cells, reverse transcribe RNA and amplify complemen- 
tary DNA. 

RNA-seq libraries from the amplification products of single cells as 
well as bulk control samples were sequenced to a depth of (2-5) x 10° 
reads per library (Methods). Saturation analysis confirmed that this 
sequencing depth is sufficient to detect most genes expressed by single 
cells (Extended Data Fig. 3a). Technical noise and dynamic range were 
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Figure 1 | Single-cell RNA-seq of 80 embryonic (E18.5) mouse lung 
epithelial cells enables unbiased identification of alveolar, bronchiolar and 
progenitor cell populations. a, Spatially heterogeneous differentiation of 
distal lung epithelium. The micrograph of a newly forming alveolar sac 
(asterisk) and the diagram below illustrate cell types and the gradient of 
developmental intermediates comprising the distal lung epithelium during 
sacculation (E18.5). Micrograph: green, Pdpn, alveolar type 1 (AT1) marker; 
red, Sftpc, AT2 marker; white, E-cadherin, pan-epithelial marker. BPs are 
characterized by co-expression of some AT1 and AT2 markers. In the diagram, 
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BPs (brown) persist at the tip, and nascent AT2 (red) and AT1 (orange) cells are 
located more proximally. Ciliated (green) and Clara (blue) cells are located in 
the bronchiolar epithelium (not labelled in the micrograph). Scale bar, 75 jum. 
b, PCA of 80 single-cell transcriptomes (three biological replicates) at E18.5 
distinguishes between major bronchiolar and alveolar cell lineages. PC, 
principal component. c, Distinct gene groups characterize each cell population 
on the basis of differential correlation with PC1 and PC3. The arrow tip denotes 
the correlation coefficient of the respective gene with each principal 
component. 
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assessed by using RNA control spike-in standards and by comparing 
single cells with the bulk samples (Extended Data Fig. 3b-e). The results 
are consistent with previous data from our group’ and others**°; we 
obtained single-transcript sensitivity and high (~ 10°) dynamic range. 
Comparison of three biological replicate experiments showed that median 
expression of all genes across single cells was strongly correlated (r = 0.91 
and r = 0.92; Extended Data Fig. 3f, g). 

We performed principal component analysis (PCA) on all 80 single- 
cell transcriptomes by using genes expressed in more than two cells and 
with a non-zero variance (8,578 genes). Genes with highest loadings in 
the first four principal components were analysed by unsupervised 
hierarchical clustering as well as PCA (Fig. 1b, c, Fig. 2a and Supplemen- 
tary Data). This unbiased approach detected five different cell popula- 
tions and four different gene families, which permutation analysis showed 
to be highly significant (Methods). Using known marker genes within 
the different clusters, we were able to associate cells with four previ- 
ously reported epithelial cell types (Clara (Scgb1a1), ciliated (Foxj1), 
AT1 (Pdpn, Ager) and AT2 (Sftpe, Sftpb) cells). The fifth group was char- 
acterized by co-expression of AT1 and AT2 marker genes and was located 
on the PCA plot between the populations of AT1 and AT2 cells, sug- 
gesting either an intermediate population undergoing a transition between 
the two alveolar lineages or a population of bipotential alveolar pro- 
genitors. As discussed below, transcriptional profiles of distal lung epi- 
thelial cells at E16.5 implicate this fifth population as alveolar bipotential 
progenitor (BP) cells®. We validated these findings in two biological 
replicates of pooled E18.5 lungs by microfluidic single-cell quantitative 
PCR (qPCR) experiments: hierarchical clustering of ten known alve- 
olar and bronchiolar marker genes identified the same five populations 
(Extended Data Fig. 5a—d). Together, these results show that single-cell 
RNA-seq enables the identification and molecular characterization of 
cell types and developmental intermediates retrospectively without the 
need to first purify populations of interest. 

In addition to classifying the epithelial cell populations in the distal 
lung at E18.5, our analysis identified sets of genes specific to each pop- 
ulation, providing a battery of previously unknown markers that can 
be used to distinguish cells from each alveolar and bronchiolar lineage. 

We used Guilt-by-Association and correlation analysis to assess the 
significance of co-expression of genes in all cells belonging to a specific 
cell type (Methods, Fig. 2b and Supplementary Data). The large number 
of lineage-specific genes allowed us to annotate functions of individual 
cell types by gene ontology and pathway enrichment analysis”! (Extended 
Data Fig. 4a and Supplementary Data): AT1 cells were enriched in path- 
ways associated with extracellular matrix-receptor interaction, focal 
adhesion, tight and adherens junctions and regulation of the actin cyto- 
skeleton; AT2 cells were enriched for adipocytokine and PPAR signal- 
ling and for lysosome pathways; the Clara cell lineage was enriched for 
metabolism of xenobiotics by cytochrome P450, drug metabolism and 
glutathione metabolism; and ciliated cells showed enrichment for pro- 
gesterone-mediated oocyte maturation and cell cycle pathways. Further- 
more, we identified transcription factors, receptors and ligands whose 
expression profile across all single cells was strongly correlated with the 
individual cell types (Extended Data Fig. 4b, c). 

Among the numerous newly identified putative cell-type markers, 
several are of particular interest. Hopx transcription factor was previ- 
ously reported to regulate alveolar maturation by suppressing surfactant 
protein production in AT2 cells”; our data show that Hopx is expressed 
in BPs, turns off in maturing AT2 cells and is maintained in AT1 cells. 
We validated the AT 1-specific expression of Hopx by transgenic label- 
ling and co-localization with two AT1 markers, Pdpn and Ager (Fig. 2c 
and Extended Data Fig. 4e). We also found that Vegfa endothelial growth 
factor is specifically expressed in the AT1 lineage, presumably serving 
as a signal to activate nearby capillary endothelial cells; AT 1-specific 
expression was validated by single-cell qPCR (Extended Data Fig. 4d). 

Egfl6, encoding a protein implicated in cell adhesion and cell differen- 
tiation, is specifically expressed in AT2 cells; AT2-specific expression was 
confirmed by multiplex in situ hybridization with the canonical AT2 


372 | NATURE | VOL 509 | 15 MAY 2014 


D 
a 
nel 


D 

o 
nel 

ie 


0 5 10 15 
log, (FPKM) 


HB Rep 3 


= 
im 


PEzelID Bse|D, 


a f ciated | joan 
1110017D15Rik5 = ha =! db =’ i; 
1700007G11Rik; == Cyp22- Akap5 >; Lamp3-; = 
1700009P17Rik; == Scgb3a2-| Ahnak> Sic34a2- 
4930451C15Rik> Hp: =_ Lmo77 —_—— $1009; = 
6820408C15Rik~> [i Itm2a-- *Pdp> _ _ Scdi-; _ 

BC0483555 Krt15-| = Emp25 ME) Pppiri4c- = 
Ccdc113-> Col23a1- iz Timp35 Baa Sftpal- 
7p) Ccdc195 Upk3a-| i Malatt 5 Zz! He —_ 

2 Ccdc39; Cd200- Bs Cav15 Zz Egis- 

oO Ccdc40|; Rassf9- Zz Dpysl25 Zz Fabp5- 

jo) Ccdc67; Cbr2- Zz Sema3a4 Ey Gx; = 

= Efhc|; = Nupr1-) Zz Col4a3= Bea Ca36-- 

g Fanki; MMM) = <Scgblai-| i Agr; ii Dki-| = 

a *Foxj15 [MEE | 181001 0H24Rik- Zz! Ptrf- aaa 123- = 

€ Hsési2; ypd2-' Lt Rtkn25 {| Lcn2- 

Kndc1; Ppap2b- Zz Lgals3 = | CxcliS- ii’ 

S Lrc23- Ccnd2-) = Sdpr- [| *Sftpb-| iii’ 

= Lriqi; = Kdr-| |_| Cldn18 4 Mag) Sic31a1- —— 

& Ncs15 = Osgin1- aS Samhd1 4 | Bex2 - Td 

=] Stk33> Cd24a- || Agrn = rt Chsy1 - 7 

a Tekt4; = Dexr- Tt Pmp22- = Lyz1- Tt 
Ttc18; = Cend1-) Ld Qk @| **Sftpc- =i 
Wdr16- Miat= |_| Spnb2- [| Fabp12- = 
Kif?i- _ Aldh1a7- td $100a14- || Fasn - || 
Lrc365 Gsto1=) [| Sennig< r | Soatt -| ZZ 
MA; lyd-| | Clic3 = mi Chi3it -; Zz 
Spagi7- 7! Aldhtat- |_| Msn [| Rab27a- |_| 
Trafgip1; C3-| | Hopx- | Mid1ip1 -; Lt 
Enkurs iz Bcl6-| |_| Tspan8 = Tt Nucb? -| [| 
Gm63205 | | Tacstd2 it Vegfas_ = I Rab27b _ = 

2YULS RLLS LES ZLLS 

2 2) O0c eee 2o.o'r pees 


Corrected P value 


Figure 2 | Single-cell transcriptome analysis discovers previously unknown 
markers. a, Hierarchical clustering of RNA-seq data from 80 single distal lung 
epithelial cells (E18.5, three biological replicates) identifies five molecularly 
distinct populations, assigned to alveolar and bronchiolar lineages based on the 
presence of canonical marker genes (asterisks) within the respective gene 
clusters (AT2 (red), Sftpb and Sftpc; AT1 (orange). Pdpn; ciliated (green), Foxj1; 
Clara (blue), Scgb1a1). BPs (brown) co-express AT1 and AT2 markers. 

Each row represents a single cell, each column a gene (104 genes in total; 
Supplementary Data). Permutation analysis supports the significance of the 
presented clustering (P = 2.89 X 10°)”; Methods). FPKM, fragments per 
kilobase of transcript per 10° mapped reads. b, Bar graphs showing the top 30 
putative marker genes for each cell lineage inferred from the E18.5 single-cell 
transcriptomes as a function of the multiple testing corrected P value for 
each gene (Guilt-by- Association; Methods). Canonical markers are bold and 
coloured. c, Validation of Hopx expression in AT1 cells. A lung section from a 
transgenic Hopx-Cre-ERT2*’- ;mTmG*“S adult mouse was co-stained for 
AT1 marker Ager. Maximum intensity projections of confocal zstacks show 
that ATI cells expressing membrane-localized green fluorescent protein 
(GEP; green) also express Ager (white). Scale bar, 50 um. d, Validation of Egfl6 
expression in AT2 cells. Multiplexed in situ hybridization of E18.5 lungs shows 
co-localization of probes targeting Egfl6 (green) and AT2 marker Sftpc (red) 
messenger RNA. Inset, close-up of boxed region. Blue, 4’,6-diamidino-2- 
phenylindole (DAPI)-stained nuclei. Scale bar, 50 um. e, Validation of Krt15 
expression in Clara cells. Immunofluorescent staining of E18.5 lungs with the 
use of antibodies against Krt15 (red) and Clara cell marker Scgb1al (green). 
Blue, DAPI-stained nuclei. Krt15 is also expressed outside the epithelium. Scale 
bar, 50 um. 
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marker Sftpc (Fig. 2d). Krt15, a component of intermediate filaments, 
was specifically expressed in the Clara cell lineage, which we validated 
by co-staining with the canonical Clara cell marker Scgb1al (Fig. 2e). 
Finally, we used single-cell multiplexed qPCR to validate the lineage- 
specific expression of six additional genes including Itgb4 and Top2a 
for ciliated cells, Cftr, Cebpa, Sftpd and Id2 for the AT2 lineage, and 
Vegfa for the AT1 lineage (Extended Data Fig. 4d). Most genes speci- 
fically expressed by the AT2 lineage at E18.5 were also detected by single- 
cell RNA-seq in mature AT2 cells of an adult mouse lung, whereas 
genes specific to AT 1, Clara or ciliated cells were not expressed or were 
expressed only at a low level (Extended Data Fig. 4f). Thus, we iden- 
tified a large number of new, and potentially more specific, markers for 
various biological processes and stages relevant to alveolar and bron- 
chiolar maturation. 

Identification of the progenitor and differentiated cell types at E18.5 
prompted further investigation of developmental intermediates in the 
alveolar maturation pathway. Sacculation of distal airway tubules com- 
mences at E16.5, and the distal epithelium is dominated by alveolar 
progenitor cells at this time®. We therefore measured transcript levels 
of ten known marker genes in 107 single cells of the distal lung epithe- 
lium at both E16.5 (33 cells) and E18.5 (74 cells) with multiplexed single- 
cell qPCR (Extended Data Fig. 5a—d). The marker gene expression profile 
and PCA identified Clara and ciliated cells distinct from alveolar lineages 
at both E16.5 and E18.5, corroborating the earlier separation of bronch- 
iolar from alveolar maturation pathways. However, gene expression 
of alveolar cells showed no segregation into AT1 and AT2 lineages at 
E16.5, because marker genes for both subpopulations were commonly 
expressed by all cells, whereas by E18.5 they had clearly separated. This 
is consistent with a recent temporo-spatial marker study suggesting 
that AT1 and AT2 lineages emerge from a common BP*. In addition to 
BPs and mature alveolar cells at E18.5, we observed cells in intermediate 
maturation stages on the basis of partial co-expression of AT1 and AT2 
marker genes. We used the newly identified genes specific for each 
mature alveolar cell type to subclassify these intermediates and thereby 
reconstruct the molecular pathway of differentiation of BPs into AT1 
and AT2 lineages, grouping the genes into early and late markers of either 
lineage (Fig. 3). We confirmed the presence of developmental inter- 
mediates showing heterogeneity in marker gene expression by immu- 
nofluorescence (Extended Data Fig. 5f-i). 

The constructed hierarchy identified transcription factors, receptors 
and ligands showing expression changes that correlated with specific 
transitions in the maturation states of alveolar cells (Extended Data 
Fig. 5e). The transcription factors Sox9 and Cited2 were expressed in 
BP and AT2 cells, whereas Hes1 was expressed in BP and AT1 cells. We 
did not detect any transcription factors that initiated expression exclu- 
sively in either of the maturing alveolar lineages, suggesting that lineage 
commitment involves the downregulation of factors active in alveolar 
progenitor cells rather than the de novo expression of a lineage-specific 
transcription factor. Ligands were expressed in either BP and AT2 (Cxcl15, 
Cmtm8) or BP and AT1 cells (Sema3a, Tgfb, Vegfa), and receptors were 
expressed in a BP (Fzd2), BP/AT2 (Fefr2) or BP/AT 1 (Gprc5a) pattern. 
These results show that our approach can be used to characterize tran- 
scriptional profiles of transient cellular intermediates during a dynamic 
maturation process within a complex tissue. 

Finally, we explored temporal changes within the distal lung by sequenc- 
ing additional single-cell transcriptomes before sacculation (E14.5; 45 
progenitor cells), early in sacculation (E16.5; 27 progenitor cells) and 
long after sacculation (adult; 46 transgenically labelled AT2 cells). We 
performed unsupervised hierarchical clustering analysis of Sftpc-positive 
cells (124 cells), using genes with the highest principal-component load- 
ings in a PCA analysis (Fig. 4a, b and Supplementary Data). Cells clus- 
tered in groups that were highly correlated with developmental stage 
of cell isolation, in sequence from early progenitors (EPs), through BP 
and nascent AT1 and AT2 cells, to mature AT2 cells. Thus, AT2 cell 
maturation occurs in a progressive manner through transcriptionally 
distinct intermediates that can be robustly discriminated by expression 
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Figure 3 | Molecular profiles distinguish between developmental 
intermediates during the differentiation of AT1 and AT2 cells from a 
common BP. Developmental sequence of AT1 (orange) and AT2 (red) 
specification from a common BP (brown). Two and three maturation 
intermediates were identified in the specification processes of AT2 and AT1 cell 
types, respectively, on the basis of the expression of known and previously 
unknown marker genes for both alveolar lineages measured by single-cell 
RNA-seq. Genes were grouped into early and late markers of each lineage. 
Arrows, differentiation pathway; grey braces, change in transcript level of 
respective genes, tip pointing towards lower expression. 


profile throughout embryonic and adult life. The population of EP cells 
co-expresses the AT2 marker Sftpc and the AT1 marker Pdpn, indi- 
cating that these cells are located at the tips of the branching epithelial 
tree (Extended Data Fig. 6a). EP cells segregate into two subgroups, one 
exclusive to E14.5 (early EPs; EP-A) and the other present at both E14.5 
and E16.5 (late EPs; EP-B), indicating that cellular differentiation is not 
fully synchronous throughout the lung. Both EP populations show high 
expression of genes involved in cell cycle progression and chromosome 
dynamics (gene groups IIIa, IIb and IIa; Fig. 4a, c), which are down- 
regulated during the transition of EPs to BPs. The downregulated EP- 
specific genes include transcription factor Sox11, which is expressed in 
the developing airway epithelium and causes an alveolar defect when 
knocked out”, and also Tuba1a, a putative target of Sox11*; this sug- 
gests that Sox11 could be involved in maintaining the proliferative com- 
petence of EP cells. At E18.5, BP cells expressing both AT1 and AT2 
markers appear in conjunction with intermediate populations with a 
decreased expression of AT1 markers (nascent AT2) or AT2 markers 
(nascent AT1). Mature AT2 cells are characterized by the expression 
of genes involved in respiratory gas exchange and immune response 
(gene group IV) and were only detected at adult stages (Fig. 4a). The over- 
all number of genes as well as the total number of transcripts expressed 
in each cell were strongly correlated with its differentiation state: early 
progenitor cells at E14.5 expressed up to 6,000 genes, whereas mature 
AT2 cells expressed about fourfold to sixfold fewer genes (Fig. 4a and 
Extended Data Fig. 7a). Thus, we followed the full life cycle of Sftpce* 
cells and identified seven gene sets that robustly distinguish between 


15 MAY 2014 | VOL 509 | NATURE | 373 


©2014 Macmillan Publishers Limited. All rights reserved 


LETTER 


E145 | 

E16.5 0 " 5 9 (15 
ranscript level 

E18.5 (FPKM, log,) 

Adult pcuaAdnnannaanKalnanaiAamanabenent = - (PRUE ARSE PT ee a ee 


Early 
progenitor 


Nascent 
AT1 


° =. © 
son] 


E 
BP EE 
i 
i 
L_ b- fr 
Nascent eo 5 
AT2 ce Fr 
- i 
| k 
4 Z 
- ok 
= r 
a. : ‘ 
% tel 
al {Tp - FF 
n 
2 
AT1 markers Cell cycle/chromosome pire Late AT2 markers *'Ribosame S 
¢ Regulation of cell Regulation of cell cycle ribosome e Gaseous exchange * Cell adhesion ‘So 
proliferation * Microtubule-based process ¢ Antigen processing/presentation of | * Regulation of transcription 6 
Membrane organization * Chromosome TF: Jun peptide antigen TF: Sox9 2 
© Channel activity © Cytoskeleton organization ¢ Immune response 
TF: Hopx TFs: Hmgat-rs1, Sox11 
b c 
a. i see D Mb Va ie 
405 nascent /, %.0 © F165 EP-A_ Median 
AY ss © E18.5 transcript 
Adult = EP-B 4 level 
S 4 {e) 
& ma - Nascent | 
2 Nascent/ \ 2 ATI 
al mature @ BP 
5 4 
9 0 AT2 £ 
a Nascent | 
AT2 
Mature | 
~20 4 AT2 
051015 051015 051015 051015 051015 051015 0 5 1015 


50 


Figure 4 | Single-cell RNA-seq of Sftpc* cells at E14.5, E16.5, E18.5 and 
in the adult mouse lung explains progressive transcriptional states of the 
AT2 cell lineage throughout its life cycle. a, Hierarchical clustering of 124 
Sftpc* cells from distal mouse lung epithelium of embryonic (E14.5, E16.5 and 
E18.5) and adult mice based on genes with highest principal-component 
loadings (Supplementary Data) in an unbiased PCA analysis (shown in b) of all 
cells and genes. Single cells are shown in rows; genes are shown in columns. 
Bars at the right show Sftpc and Pdpn expression, as well as the number of genes 
expressed by each single cell (see also Extended Data Fig. 7). Functional gene 
ontology enrichments and transcription factors (TFs) specific to each gene 


multipotential, bipotential, nascent and mature AT2 cell states (Ex- 
tended Data Fig. 6b). 

We expect that a similar strategy to that pursued here can be applied 
to almost any tissue to empirically classify and characterize the full set 
of developing and mature cell types, explain the molecular regulation of 
these distinct populations, and explore how they are disrupted in disease. 


METHODS SUMMARY 


Single-cell suspensions were prepared from micro-dissected distal regions of embry- 
onic mouse lung at E14.5, E16.5 and E18.5 and also from adult mouse lung. Epithelial 
cells were purified by either magnetic bead-activated cell sorting (MACS; Miltenyi 
Biotec) using CD45 depletion and EpCAM enrichment or by fluorescence-activated 
cell sorting (FACS) of transgenically labelled cells. Single cells were captured on a 
microfluidic chip on the C1 system (Fluidigm) and whole-transcriptome amplified 
cDNA was prepared on chip using the SMARTer Ultra Low RNA kit for lumina 
(Clontech). Single-cell libraries were constructed as described previously’ with the 
use of the Illumina Nextera XT DNA Sample Preparation kit and sequenced to a 
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group (bottom grey-shaded bars) are shown (Supplementary Data). A similar 
analysis following the Clara cell lineage throughout development is shown in 
Extended Data Fig. 8. b, PCA of single-cell transcriptomes based on genes 
detected in more than two cells. Cells cluster into three major populations 

on the basis of different scores along the first two principal components. 

c, Violin plots depicting the course of expression of each of seven distinct gene 
groups across the six cell populations. Each violin plot shows the frequency 
distribution of the mean transcript level (log,-transformed FPKM) of all genes 
per cell. 


depth of (2-5) X 10° read pairs (HiSeq 2000; Illumina). Expression levels of tran- 
scripts were quantified with TopHat/Cufflinks”. Single-cell RNA-seq data was 
validated by the preparation of single-cell qPCR amplicons of 96 genes on a C1 micro- 
fluidic chip (Fluidigm) followed by multiplexed qPCR with BioMark (Fluidigm) as 
described previously”®. Single-cell gene expression data was analysed with custom 
R scripts” (Supplementary Data). Gene ontology enrichment analysis was performed 
with DAVID informatics resources 6.7 (ref. 21). Additional experiments were per- 
formed with immunofluorescence and in situ hybridization (ACD’s RNAscope 
In situ Hybridization Technology). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Mouse strains. Timed-pregnant C57BL/6] females (JAX) were used for all embry- 
onic time points reported; gestation age was verified by crown-rump length before 
use. For adult mice, a transgenic-labelling approach was employed to enrich for AT2 
cells. Mice were bred to be homozygous for a knock-in allele into Sftpc encoding for 
a reverse tetracycline transactivator (SftpC-Cre-ERT2-rtTA)”* and heterozygous for 
an inserted transgene that drives the expression of a GFP tagged human histone 1 
ina tetracycline-dependent manner (tetO-HIST1H2BJ/GFP). To validate the expres- 
sion of Hopx, mice were bred to be hetereozygous for the knock-in of a tamoxifen 
inducible Cre recombinase (Cre-ERT2) construct into the Hopx gene (Hopx-Cre- 
ERT2)” and heterozygous for a transgenic insertion into the Rosa26 locus encod- 
ing a two-colour, membrane-tethered fluorophore reporter that switches expression 
from a red to green fluorophore on Cre-mediated recombination (mTmG) (cross 
of B6;129S-Hopx'”"*/J and B6.129(Cg)-Gt(ROSA)26Sor"™ {ACTH tATomato- EGE) Luo yy, 
Genotyping was performed using PCR with published primer sets from genomic 
DNA extracted from tails by Proteinase K (Sigma) digestion and precipitation with 
ethanol. Mice were housed in filtered cages and all experiments were performed in 
accordance with approved Institutional Animal Care and Use Committee protocols. 
Isolation and disaggregation of lung tissue. Single-cell experiments were per- 
formed on embryonic mouse lung at E14.5, E16.5 and E18.5 and also on adult 
mouse lung. In general, embryonic experiments were performed on pooled sibling 
lungs of one litter (five to seven lungs per pool). One of three replicate experiments 
at E18.5 (cells referred to as ‘E18_2_Cxx’ in Supplementary Data) was performed 
on a single embryonic lung. 

Adult mice were euthanized by administration of CO}. For time points E14.5, E16.5 
and E18.5, embryos were removed and lungs were isolated en bloc without per- 
fusion and pooled by litter (five to seven embryos) for further processing. Lungs 
from E14.5 and E16.5 time points were dissociated in Dispase (BD Biosciences) and 
triturated with glass Pasteur pipettes until a single-cell suspension was attained. 
For E18.5 and adult time points, either total lung (adult) or peripheral lobe edges 
(E18.5) were minced with a razor blade into 1 mm?* fragments, suspended in 5 ml of 
digestion buffer consisting of elastase (3 U ml’; Worthington Biochemical Cor- 
poration) and DNase! (0.33 Uml ?; Roche) in DMEM/F12 medium, incubated 
with frequent agitation at 37 °C for 45 min, and triturated briefly with a 5-ml pipette. 
For all time points, an equal volume of DMEM/F12 supplemented with 10% FBS and 
penicillin-streptomycin (1 U ml~!; Thermo Scientific) was added to single-cell sus- 
pensions before passing the suspension through a 100-t1m mesh filter (Fisher) and 
centrifugation at 400g for 10 min. Pelleted cells were resuspended in red blood cell 
lysis buffer (BD Biosciences), incubated for 2 min, passed through a 40-jzm mesh 
filter (Fisher), centrifuged at 400g for 10 min and then resuspended in sorting 
buffer (PBS supplemented with 0.05% BSA and 2mM EDTA). 

Purification of embryonic distal lung epithelial cells by MACS. Lung epithelial 
cells for embryonic time points (E14.5, E16.5 and E18.5) were purified by MACS 
using MS columns (Miltenyi Biotec) in MACS buffer (2 mM EDTA, 0.5% BSA in 
PBS, filtered and degassed) in accordance with the protocol provided by the vendor. 
Before loading, the single-cell suspension was passed through a 35-|.m cell strainer 
(BD Biosciences). Leukocytes and alveolar macrophages were removed by deple- 
tion with an antibody against the surface antigen CD45 conjugated to magnetic 
beads (Miltenyi Biotec) followed by enrichment for epithelial cells, incubating first 
with a biotinylated primary antibody for EpCAM (clone G8.8; eBioscience) followed 
bya secondary antibody against biotin conjugated to magnetic beads (Miltenyi Biotec). 
Purification of adult AT2 cells by FACS. For AT2 cells from the adult lung, an 
adult Sftpc-Cre-ERT2-rtta ‘~ tetO-HIST1H2BJ-GFP */~ mouse was injected with 
2 mg of doxycycline (Sigma) three days prior to the single cell experiments. After 
isolation and disaggregation of the lung, the single-cell suspension was incubated 
with a viability stain (Sytox Blue; Invitrogen) for 15 min and viable GFP* cells were 
sorted by FACS (Aria II; BD Biosciences) into DMEM containing 10% FBS. 

Capturing of single cells and preparation of cDNA. Single embryonic lung epi- 
thelial cells were captured on a medium-sized (10-17 um cell diameter) micro- 
fluidic RNA-seq or STA chip (Fluidigm) using the Fluidigm C1 system. To ensure 
unbiased and comprehensive profiling of all distal lung epithelial cells, an initial 
experiment was performed with a microfluidic chip with a 17-25-1m capture range; 
however, no cells with diameter greater than ~ 15 ,um were captured, indicating that 
no major cell populations were missed by using the smaller capture range (Extended 
Data Fig. 2b). Cells were loaded onto the chip at a concentration of 300-500 cells pl’, 
stained for viability (LIVE/DEAD cell viability assay; Molecular Probes, Life Tech- 
nologies) and imaged by phase-contrast and fluorescence microscopy to assess the 
number and viability of cells per capture site. Only single, live cells were included 
in the analysis. For RNaseq experiments, cDNAs were prepared on chip using the 
SMARTer Ultra Low RNA kit for Illumina (Clontech). ERCC (External RNA 
Controls Consortium) RNA spike-in Mix (Ambion, Life Technologies) was added 
to the lysis reaction and processed in parallel to cellular messenger RNA. For qPCR 
experiments, amplicons were prepared with pooled DELTAgene assays (Fluidigm) 


and Ambion (Life Technologies) Cells to CT lysis and pre-amplification kit, using 
the protocol provided by Fluidigm. 

RNA-seq library construction. Single-cell cDNA size distribution and concen- 
tration was assessed on a capillary electrophoresis-based fragment analyser (Advanced 
Analytical). Ilumina libraries were constructed in 96-well plates using the Illumina 
Nextera XT DNA Sample Preparation kit as described previously’ using the pro- 
tocol supplied by Fluidigm. For each C1 experiment, a bulk RNA control (about 200 
cells) and a no-cell negative control were processed in parallel in PCR tubes, using 
the same reagent mixes as used on chip. Libraries were quantified by Agilent Bio- 
analyzer, using High Sensitivity DNA analysis kit, and also fluorometrically, using 
Qubit dsDNA HS Assay kits and a Qubit 2.0 Fluorometer (Invitrogen, Life Technologies). 
DNA sequencing. Single-cell Nextera XT (Illumina) libraries of one experiment 
were pooled and sequenced 100 base pairs (bp) paired-end on Illumina HiSeq 
2000 to a depth of (2-6) X 10° reads (three replicate experiments of distal mouse 
lung epithelial cells at E18.5, one experiment at E14.5 and one experiment on adult 
AT2 cells) or 150 bp paired-end on Illumina MiSeq (one experiment at E16.5) toa 
depth of 100,000-550,000 reads with v3 chemistry. CASAVA 1.8.2 was used to sep- 
arate out the data for each single cell by using unique barcode combinations from 
the Nextera XT preparation and to generate *.fastq files. 

Microfluidic single-cell multiplexed qPCR. Single-cell multiplexed qPCR was 
performed in a M96 quantitative PCR DynamicArray on the Fluidigm Biomark 
instrument as described previously”, using a panel of 96 DELTAgene assays (Fluidigm; 
Supplementary Table 2). In three of five single-cell qPCR experiments, ERCC 
spike-in transcripts (Ambion Live Technologies) were added to each single-cell lysis 
reaction on chip. Primer pairs for 6 of the 92 exogenous RNA spike-ins (ERCC 
spike-ins ERCC-00033, ERCC-00136, ERCC-00044, ERCC-00164, ERCC-00054 
and ERCC-00074) were added to the preamplification reaction on chip and were 
subsequently used in the multiplexed qPCR experiment to detect the transcript 
level of each RNA spike-in. qPCR detection of the spike-in transcripts was later used 
to convert measured C, values to approximate numbers of transcripts in a subset of 
90 genes (Extended Data Fig. 7). 

Processing, analysis and graphic display of single-cell RNA-seq data. Raw reads 
were pre-processed with the sequence-grooming tools FASTQC”™, cutadapt*! and 
PRINSEQ” followed by sequence alignment with the Tuxedo suite (Bowtie™’, Bowtie2 
(ref. 34), TopHat”*) and SAMtools*, using default settings. See Supplementary Data 
for information about the number of total reads and the percentage of mapped 
reads for each single cell. Transcript levels were quantified as fragments per kilobase 
of transcript per million mapped reads (FPKM) generated by TopHat/Cufflinks. 
Where depth matching was done, Seqtk (H. Li, https://github.com/1h3/seqtk/) was 
used to select raw reads randomly from each library, and the same pre-processing 
and alignment pipelines were used to obtain FPKM values for the depth-matched 
samples. Limit of detection of microfluidic single-cell RNA-seq was determined by 
analysing the correlation between the concentration of exogenous ERCC spike-in 
mRNA sequences and their respective mean FPKM values as measured by RNA-seq 
(Extended Data Fig. 3c). The spike-in sequences reflect a diverse range of sequence 
content and length, they have low homology with eukaryotic transcripts because 
they are from microbial sources, and they span a large range of concentrations to 
allow an empirical determination of the limit of detection”**’’. The limit of detec- 
tion was on the order of 0.5 molecules per reaction chamber, which is reflected as 
an FPKM value of ~1 (or 0 on a log; scale). Transcripts with an FPKM value below 
or equal to 1 were therefore considered not expressed. Cells not expressing either of 
two housekeeping genes Actb and Gapdh (encoding B-actin and glyceraldehyde-3- 
phosphate dehydrogenase, respectively), or expressing them at less than three stan- 
dard deviations below the mean, were scored as unhealthy and removed from the 
analysis. After applying this filter, a total of 80 cells remained for three replicate 
experiments at E18.5 (2 X pooled sibling lungs (20 and 26 cells), 1 X single lung 
(34 cells)), 45 cells remained for one experiment at E14.5, 27 cells remained for one 
experiment at E16.5 and 46 cells remained for an experiment of adult AT2 cells 
yielding 198 single cells in total. 

For the lung epithelial cells at E18.5, we detected between 1,017 and 4,998 expressed 
genes per single cell, 10,946 in the union of all single cells and 8,653 in the 200 cell 
control bulk sample, indicating the heterogeneity of the analysed single cells. A 
total of 81 genes were commonly expressed in all single cells (Supplementary Table 1), 
which were mainly enriched for general processes such as translation. 

FPKM values were converted to an approximate number of transcripts by using 
the correlation between the number of transcripts of exogenous spike-in mRNA 
sequences and their respective measured mean FPKM values (Extended Data Fig. 3c). 
The number of spike-in transcripts per single-cell lysis reaction was calculated from 
the concentration of each spike-in provided by the vendor (Ambion, Life Tech- 
nologies), the approximate volume of the lysis chamber (10 nl) and the dilution of 
spike-in transcripts in the lysis reaction mix (X 40,000). Transcript levels were con- 
verted to logarithmic space by taking log,(FPKM/number of transcripts). When 
calculating the characteristic single-cell expression (Extended Data Fig. 3b, d, f, g), 
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we used either the mean FPKM or the median FPKM value of each gene across all 
single cells transformed to the log, space. To calculate the coefficient of variation of 
each gene across single cells (Extended Data Fig. 3b), the standard deviation of the 
log,-transformed FPKM values of a gene across all single cells was divided by the 
mean log, FPKM value of the same gene. 

Saturation plots (Extended Data Fig. 3a and 7a) were generated as described 
previously’. In brief, a corresponding number of millions of raw reads were randomly 
selected from each sample library and then, using the same alignment pipeline, FPKM 
values were called for each gene. This random subsampling was repeated, for each 
sample replicate, a total of four subsampled data sets per point, and the mean number 
of genes with an FPKM greater than 1 was plotted. For generating the ‘single-cell 
ensemble’ data set, raw reads from all the single-cell RNA-seq libraries were bio- 
informatically pooled to mimic a bulk RNA-seq experiment. 

Custom R scripts” were used to perform principal component analysis (PCA), 
hierarchical clustering, Guilt-by-Association and permutation analysis as well as 
to construct violin plots, correlation plots and histograms. The scripts can be found 
in Supplementary Data. PCA analysis was performed on all cells using all genes 
expressed in more than two cells and with a variance in transcript level (log.(FPKM)) 
across all single cells greater than 0.5. Subsequently, genes with the highest principal- 
component loadings (highest absolute correlation coefficient with one of the first 
three or four principal components) were identified. Hierarchical clustering was 
performed on cells and on the genes identified by PCA, using Euclidean or corre- 
lation distance metric. 

The specificity of the hierarchical clustering in Fig. 2a identifying five distinct 
cell populations was assessed with a permutation analysis approach. The sum of 
squares within groups (SSW) was therefore calculated for the cell grouping pre- 
sented in Fig. 2a as well as for 50,000 random permutations thereof, keeping the 
size of cell groups and the total number of groups constant. With xk being the 
transcript level of gene j in cell i belonging to group k, the SSW can be calculated as 


ssw= 5) (ea) 
k=1j=1 


1 n 
with a = y x being the mean transcript level of genej in all cells i = 1,2,..., 
i= 


belonging to group k. The SSWs for all 50,001 permutations were normally dis- 
tributed and the SSW for our chosen clustering was significantly lower than that 
for all other permutations (P = 2.89 x 10/72), 

When Sfipc" cells were isolated from all single-cell RNA-seq data sets (Fig. 4), a 
Sftpc transcript level of log,(FPKM) = 10 was chosen as threshold to separate cells 
with background Sftpc expression from cells with high Sftpc expression (referred to 
as Sftpc” cells). 

To search for further previously unknown cell-type markers and cell-type spe- 
cific transcription factors or receptors/ligands beside the genes identified by PCA, 
we defined a ‘perfect marker gene’ for each cell type with a high transcript level 
(log,(FPKM) = 10) in all cells of the respective cell type, and with no expression 
(FPKM = 0) inall other cells. We then determined the pairwise Pearson correlation 
between the single-cell expression profile of each perfect marker gene and every 
other transcribed gene. The list of murine transcription factors that were screened 
for cell-type specificity was obtained from the online animal transcription factor 
database (http://www. bioguo.org/AnimalTFDB/) (ref. 38). All genes identified as 
cell-type-specific by PCA analysis and hierarchical clustering (see above) also had 
a high Pearson correlation coefficient with the corresponding perfect marker 
gene. The Pearson correlation coefficients for the most strongly correlating genes 
are shown in Supplementary Data together with information about the top 30 genes 
per cell type regarding previous detection in cell types in the lung and regarding 
available literature or known mouse knockout phenotypes. 

Guilt-by-Association analysis” was used to calculate the probability of observing 
a given co-expression of two genes by chance. Gene expression values were there- 
fore scaled gene-by-gene by mean-centring and dividing by the standard deviation 
of the respective gene across all single cells, and a binary expression matrix was con- 
structed by defining a gene as expressed in a given cell if the scaled expression level 
was greater than or equal to 0, and as not expressed if it was smaller than 0. Pairwise 
comparisons were performed between the perfect marker gene for each of the four 
mature cell types (AT1, AT2, Clara and ciliated) and all other genes expressed in at 
least one cell (10,946 genes in total). P values were calculated with the hypergeo- 
metric distribution as described in ref. 37, and multiple testing was accounted for 
by the Benjamini-Hochberg method (Fig. 2b and Supplementary Data). 

Gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) path- 
way enrichment analyses were performed with DAVID informatics resources 6.7 
of the National Institute of Allergy and Infectious Diseases of the NIH” (Sup- 
plementary Data). 
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Analysis and graphic display of microfluidic single-cell multiplexed qPCR data. 
Single-cell multiplexed qPCR data were analysed and displayed by using custom R 
scripts’’. qPCR experiments were performed for E16.5 (two biological replicates), 
E18.5 (two biological replicates) distal lung epithelial cells and for adult AT2 cells 
(one replicate). The limit of detection of multiplexed qPCR values was determined 
as 22 threshold cycles (C,) by a calibration experiment with 16-fold serial dilutions 
of total lung cDNA and six replicates for each concentration. Genes that were not 
expressed were given a value higher than the limit of detection and the limit of 
detection was subtracted from all C, values to transform C, values to logs expression 
values (log,Ex = C,yop — Cp CiLop = 22). Cells not expressing either of two house- 
keeping genes Actb and Gapdh, or expressing them at less than three standard devi- 
ations below the mean, were scored as unhealthy and removed from the analysis. 
After applying this filter, 74 single cells remained for two experiments at E18.5, 33 
cells for two experiments at E16.5, and 48 cells for the experiment with adult AT2 
cells. In all experiments, cells were isolated from pooled lungs from one litter (five 
to nine lungs). To combine experiments from different chips for the same embryonic 
time point, the expression value of each gene for a given cell was normalized to the 
median gene expression value of that cell. Normalized gene expression values were 
further scaled gene by gene by mean-centring and dividing by the standard deviation 
of expressing cells. PCA and hierarchical clustering using Euclidean distance metric 
were performed in R for all cells, using ten canonical marker genes for bronchiolar and 
alveolar cells (Abca3, Sftpb, Mucl, Sftpc, Lyz2, Aqp5, Pdpn, Ager, Foxj1 and Scgb1a1). 
Immunofluorescence. E18.5 lungs were removed en bloc; for whole-mount stain- 
ing, the tip of the accessory lobe was excised. Lungs and tips were immersion-fixed 
overnight in paraformaldehyde (4% in PBS) at 4 °C, and then dehydrated and stored 
in methanol at —20 °C until being stained. Lungs of adult mice were collected as 
above except that after clearance of the pulmonary vasculature, the ventral trachea 
was incised and cannulated, and lungs were gently inflated to full capacity with 
molten low-melting-point agarose (2% in PBS; Sigma). Ice-cold PBS was dripped 
into the thorax to solidify the agarose, and inflated lungs were removed en bloc and 
processed as above. E18.5 lungs were rehydrated, cryoprotected overnight in 30% 
sucrose at 4°C, submerged in OCT (Tissue Tek) in an embedding mould, frozen 
on solid COs, then stored at —80 °C. Sections of thickness 10 im obtained with a 
cryostat (Leica CM3050S) were collected on chambered glass slides and stored at 
4°C before being stained. 

Similar immunofluorescence protocols were used on whole mounts as on sec- 

tions, except that incubation times were increased to compensate for tissue thick- 
ness. Lung tissue was permeabilized (10 min, PBS containing 0.3% Triton X-100), 
washed (three 5-min washes in PBS containing 0.1% Tween 20) and blocked (1h, 
PBS containing 10% donkey serum) before incubation overnight with primary anti- 
body. Adult lungs did not require further permeabilization. Primary antibodies 
against the following antigens (used at 1:200 dilution unless otherwise noted) were 
pro-SftpC (rabbit; Chemicon AB3786), Pdpn (hamster; DSHB 8.1.1), E-cadherin 
(rat; Zymed ECCD-2), Rage (rat; R&D), Scgb1al (rabbit; Upstate), Lamp3 (sheep, 
AF4584; R&D), $100a6 (rat, DDX0192; Dendritics) and Krt15 (mouse, LHK15; 
SCBT) directly conjugated to a fluorophore in accordance with the manufacturer’s 
instructions (Alexa Fluor Antibody Labelling Kit; Invitrogen). After further wash- 
ing, sections were incubated with appropriate secondary antibodies conjugated to 
an Alexa fluorophore (donkey A488, A555 or A633; Invitrogen) as well as DAPI 
(Sng ml) for 1h, followed by washing and mounting in Vectashield (Vector). 
Lung tissues were imaged with a laser scanning confocal microscope (LSM 780; Zeiss). 
In situ hybridization. For in situ hybridizations, embryonic lungs were collected 
as for immunostaining (see above), washed briefly in PBS (autoclaved, diethyl 
pyrocarbonate-treated) and snap-frozen in OCT for sectioning. Sections of thick- 
ness 10 tm were generated on the cryostat and stored at —20 °C before further pro- 
cessing. To validate AT2-specific expression of Egfl6 RNA by in situ hybridization, 
sections were transported on dry ice to a company specializing in processing and 
imaging dual in situ hybridized samples, in accordance with the company’s reported 
protocol (ACD’s RNAscope In situ Hybridization Technology). To validate the 
expression of Sftpc and explore its spatial expression pattern in the embryonic mouse 
lung (E11.5, E13.5 and E14.5), in situ hybridizations were performed on whole- 
mount mouse lungs as described previously*'. 
Validation of Hopx as AT1 marker gene by transgenic labelling. Cells actively 
transcribing Hopx in the adult mouse lung were labelled by injecting 2 mg of 
tamoxifen (Sigma) in corn oil at 20 mg ml concentration intraperitoneally into 
postnatal day-28 Hopx-Cre-ERT2*/~ mTmG*’~ mice. Three days later, lungs were 
collected as described above, fixed overnight in paraformaldehyde (4% in PBS) at 
4°C and stored in 80% glycerol at 4°C before imaging with a laser scanning 
confocal microscope (LSM 780; Zeiss) with a 0.8 numerical aperture, 25% oil- 
immersion objective and confocal z-sections with a thickness of 2.3 um. 
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Extended Data Figure 1 | Schematic illustration of the process of 
sacculation. a, Schematic illustration of morphological and molecular changes 
of the distal airways during development. Cell differentiation progresses in a 
directional manner from the bronchio-alveolar junction (proximal) to the 
distal tip (distal) of each terminal airway; progenitor cells therefore persist the 
longest at the tips. Ciliated (green) and Clara (blue) cells mature first, followed 
by the differentiation of flat alveolar type 1 (AT1, orange) and cuboidal type 2 
(AT2, red) cells from cuboidal alveolar progenitors during sacculation 
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(E16-18.5), when distal airway tubules widen as nascent AT1 cells flatten to 
form the gas-exchange surface. b, Micrographs of alveolar (E18.5, postnatal 

3 days (PN3d)) and bronchiolar (PN3d) sections of a mouse lung co-stained for 
Clara (Scgbla1, green) and ciliated (Foxj1, red) cell markers as well as AT1 
(Pdpn, green) and AT2 (Sftpc, red) specific markers. Progenitor cells at the tips 
of sacculating alveoli are detected by an overlap of AT1 and AT2 specific 
markers. Newly forming alveolar sacs are marked by asterisks. 
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Extended Data Figure 2 | Single-cell transcriptomics analysis workflow. 

a, Workflow of single-cell transcriptomics analysis of mouse lung epithelial 
cells. A single captured lung epithelial cell stained with Alexa488 for E>CAM 
(green) is indicated by a red arrow. b, Single lung epithelial cells captured in 
microfluidic chips with capture sites designed to trap cells with a diameter of 


10-17 um (medium, left) or 17-25 ,1m (large, right). Cells were stained for 
viability with Calcein AM. Even cells captured by the large chip did not exceed a 
diameter of ~15 tm, indicating that the medium-sized chips are sufficient for 
comprehensively profiling distal mouse lung epithelial cells. 
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Extended Data Figure 3 | Assessment of required sequencing depth, 
technical and biological variation, dynamic range and reproducibility of 
single-cell RNA-seq data of 80 single distal lung epithelial cells at E18.5. 

a, Saturation analysis reveals the sequencing depth required for the detection of 
most genes expressed by single cells. To detect most expressed genes, single-cell 
RNA-seq libraries have to be sequenced only to a depth of about 10° reads, 
whereas libraries of bulk samples have to be sequenced more deeply. The 
number of genes detected in the ensemble of all single cells (synthetic bulk) is 
comparable to the number of genes detected in the true bulk experiment. Each 
point on the saturation curve was generated by randomly selecting a number of 
raw reads from each sample library (bulk, 200 cell bulk library; single cell, 
single-cell RNA-seq libraries of 80 lung epithelial cells; single-cell ensemble, 
bioinformatically pooled single-cell libraries) and then using the same 
alignment pipeline to call genes with a mean FPKM of more than 1. Each point 
represents four replicate subsamplings; error bars represent s.e.m. b, Technical 
noise and biological variation in single-cell RNA-seq data. Relationship 
between mean expression level and coefficient of variation for 10,946 genes in 
single embryonic lung epithelial cells. Several genes show strong biological 
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variation (blue): they show higher variability than the average noise at a given 
average gene expression. Housekeeping genes are shown in yellow. c, Average 
detected transcript levels (mean FPKM, log,) for 92 ERCC RNA spike-ins as 
a function of provided number of molecules per lysis reaction for each of the 
three independent single-cell RNA-seq experiments performed at E18.5. Linear 
regression fits through data points are shown. The length of each ERCC RNA 
spike-in transcript is encoded in the size and colour of the data points. No 
particular bias towards the detection of shorter versus longer transcripts is 
observed. The method shows single transcript sensitivity as well as a dynamic 
range of approximately six orders of magnitude, in agreement with a previous 
study evaluating microfluidic single-cell RNA-seq’. d, e, Correlation between 
transcript levels of a 200-cell population and median transcript levels of single 
cells of the same pool of embryonic lungs (d), and transcript levels of two single 
AT2 cells (e). r, Pearson correlation coefficients. f, g, Correlation between 
transcript levels of all genes detected in the single lung and the pooled lung 
experiment (f) and between transcript levels of all genes detected in the two 
independent experiments on pooled embryonic lungs (g). Pearson correlation 
coefficients r are given. 
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Extended Data Figure 4 | Lineage-specific genes identified by single-cell 
transcriptome analysis allow functional description of individual distal lung 
epithelial cell populations. a, Results of gene ontology (GO) and KEGG 
pathway enrichment analyses for distal lung epithelial cell types based on 
lineage-specific genes identified by single-cell RNA-seq of 80 E18.5 distal lung 
epithelial cells (Supplementary Data). b, c, Correlograms visualizing correlation 
of single-cell gene expression profiles between transcription factors (b) or 
receptors/ligands (c) and the major canonical marker genes for bronchiolar 
and alveolar lineages (AT1: Pdpn; AT2: Sftpc; Clara: Scgb1al; ciliated: Foxj1). 
The colour bar denotes the Pearson correlation coefficient from —1 (blue, 
anticorrelated genes) to 1 (green, positively correlated genes). d, Validation of 
previously unknown marker genes by single-cell multiplexed qPCR on 74 
single cells isolated from the distal mouse lung epithelium at E18.5. 
Lineage-specific expression of seven new marker genes is shown by clustering 


with known markers for respective lineages (AT2, red, previously unknown: 
Cftr, Cebpa, Sftpd and Id2; AT1, orange, previously unknown: Vegfa; ciliated, 
green, previously unknown: Itgb4 and Top2a; Clara, blue). e, Validation of 
Hopx expression in AT1 cells. A lung section from a transgenic Hopx>GFP 
adult mouse (Hopx-Cre-ERT2‘’ ~;mTmG*" ‘8) was co-stained for AT1 marker 
Pdpn. Maximum-intensity projections of confocal z stacks show that AT1 cells 
expressing the membrane-localized GFP reporter (green) also express Pdpn 
(white). Scale bar, 50 um. f, Hierarchical clustering of 46 transgenically labelled 
mature Sftpc’ AT2 cells, isolated by FACS from adult mouse lung. Most genes 
identified as AT2 lineage-specific from single-cell transcriptomes at E18.5 

are transcribed also by mature AT2 cells. In contrast, no or low expression is 
observed in mature AT2 cells for the genes specific to the other alveolar or 
bronchiolar lineages as identified from single-cell RNA-seq data at E18.5. 
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Extended Data Figure 5 | Molecular profiles distinguish developmental 
intermediates during the differentiation of AT1 and AT2 cells from a 
common BP. a, Hierarchical clustering of multiplexed qPCR gene expression 
data for 33 single cells from E16.5 lung epithelium (CD45 /EpCAM“) suggests 
the presence at this time point of two major cell lineages, bronchiolar (cyan) 
and alveolar (brown) progenitors. Note that alveolar progenitors express a 
subset of both AT1 and AT2 marker genes. b, PCA of multiplexed qPCR data of 
lung epithelial cells at E16.5 identifies two gene groups in contrast to three 
observed at E18.5 (Fig. 1c). AT1 and AT2 specific marker genes do not 
segregate into distinct populations at E16.5. c, Hierarchical clustering of 
multiplexed qPCR gene expression data for 74 single embryonic lung epithelial 
cells (CD45 /EpCAM‘“) at E18.5 shows multiple distinct cell populations 
consistent with RNA-sequencing data at this time point: BP, AT1, AT2, Clara 
and ciliated cells. Each row represents a single cell and each column a gene. 
Cells are clustered on the basis of expression of marker genes for alveolar and 
bronchiolar lineages (AT2: Abca3, Sftpb, Mucl, Lyz2, Sftpc; AT1: Agp5, Pdpn, 
Ager; ciliated: Foxj1; Clara: Scgb1a1). d, PCA of multiplexed qPCR data 
replicates gene families found by single-cell RNA-seq at E18.5. Gene groups 
were characterized on the basis of differential correlation with the first two 
principal components. e, Developmental sequence of AT1 (orange) and AT2 
(red) specification from a common BP (brown). Two and three maturation 
intermediates were identified in the specification process of AT2 and AT1 cell 
types, respectively, on the basis of the expression of known and previously 
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unknown marker genes for both alveolar lineages measured by single-cell 
RNA-seq (Fig. 3). Transcription factors and receptors/ligands shown here were 
found to be expressed in BP cells and subsequently restricted to one of the 
alveolar lineages. Arrows, differentiation pathway; grey braces, change in 
transcript level of respective genes with tip pointing towards lower expression. 
f-i, Protein level heterogeneity of alveolar epithelial markers during 
sacculation. f, Immunofluorescent micrograph from an E19.5 lung with mature 
AT1and AT2 cells stained for their respective markers (Pdpn (white) and Ager 
(red) for AT 1; Sftpc (green) for AT2). BPs are positive for all three markers. 
Cells in intermediate states are observed, such as early AT1 (Pdpn and Ager 
positive, Sftpc low) and early AT2 cells (Sftpc positive, and either Pdpn 
positve/Ager low or Pdpn low/Ager negative). Scale bar, 10 jum. g, Markers of 
late AT2 cells are expressed heterogeneously at E18.5. Immunofluorescence 
micrograph of a lung from a Lyz2-enhanced green fluorescent protein (eGFP) 
transgenic mouse, in which within the epithelium (E-cadherin, blue) only a 
subset of Sftpc (green)-positive AT2 cells are Lyz2 (red)-positive. Scale bar, 
20 um. h, Immunofluorescent staining of E18.5 lung tissue for Lamp3 (red) 
shows heterogeneous expression of Lamp3 in Sftpc-positive cells (green): 
Proximal cells show higher Lamp3 expression than distal cells. Blue, 
DAPI-stained nuclei. Scale bar, 20 lm. i, Inmunofluorescent staining of E18.5 
lung tissue for $100a6 (red) shows heterogeneous expression of the secreted 
protein $100a6 in Pdpn-positve cells (green). Blue, DAPI-stained nuclei. 
Scale bar, 20 pum. 
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Extended Data Figure 6 | Following Sftpc-expressing cells throughout their 
life cycle. a, Whole-mount in situ hybridizations of embryonic mouse lungs at 
E11.5, E13.5 and E14.5 using probes against Sftpc mRNA show expression 

of Sftpc specific to the tips of the epithelial tree branches. Moreover, variations 
in signal intensity indicate heterogeneity in the level of Sftpc expression across 
cells, which is in agreement with our single-cell RNA-seq data of Sftpc’ cells 
at E14.5 (see Fig. 4a). b, Diagram of the different transcriptional states in the 
specification of an AT2 cell as identified by single-cell RNA-seq of Sftpc* cells 
from distal mouse lung epithelium of embryonic (E14.5, E16.5 and E18.5) and 


adult mice. The cell undergoes a transition from an early (A) and late (B) early 
progenitor state into a BP state before either taking the AT1 fate (nascent AT1), 
or following the AT2 pathway to become a nascent and finally a mature 

AT2 cell. Groups of genes turning on/up or off/down during the individual 
transitions are shown above and below each arrow, respectively (Fig. 4a and 
Supplementary Data). Whereas EP and BP cells are double positive for Sftpc 
and Pdpn, nascent and mature AT2 cells express Sftpc but turn off expression of 
the AT1 marker Pdpn. The developmental time points at which the individual 
cell states were detected, and their putative locations, are shown. 
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Extended Data Figure 7 | The number of unique genes and the total number 
of transcripts expressed by a single cell strongly correlates with its 
differentiation state. a, Saturation analysis of single-cell RNA-seq data of lung 
epithelial cells at different embryonic and adult time points (E14.5, E18.5 and 
adult AT2) reveals that the number of unique genes expressed by single lung 
epithelial cells decreases with progressing differentiation state. Distal lung 
epithelial cells at E14.5 express more than 6,000 genes, whereas cells at E18.5 
express about 3,000 genes, and mature AT2 cells only about 2,000 genes. Each 
point on the saturation curve was generated by randomly selecting a number 
of raw reads from each sample library and then using the same alignment 
pipeline to call genes with a mean FPKM of more than 1. Each point represents 
four replicate subsamplings. Error bars represent s.e.m. All libraries were 
sequenced to a depth of at least 2 X 10° reads. b, Single-cell RNA-seq reveals 
that the total number of transcripts expressed by single cells decreases with 
increasing differentiation state of the cell. The number of transcripts per cell 
was calculated from the FPKM values of all genes in each cell, using the 
correlation between number of transcripts of exogenous spike-in mRNA 
sequences and their respective measured mean FPKM values (example 
calibration curves are shown in Extended Data Fig. 3c for three replicates at 
E18.5). Area-normalized density distributions are shown for embryonic cells at 


Number of transcripts in a subset of 90 genes per cell, qPCR 


E14.5 (45 cells), E16.5 (27 cells) and E18.5 (80 cells), and for 46 Sftpc* adult 
AT2 cells. The number of transcripts is highest in lung epithelial progenitor 
cells at E16.5 and E14.5 and decreases in cells at E18.5 and even further in 
mature AT2 cells. Note that single-cell RNA-seq libraries for E14.5, E18.5 and 
adult AT2 cells were sequenced to a depth of (2-6) X 10° reads, whereas the 
libraries for cells at E16.5 were sequenced to a lower depth of 100,000-550,000 
reads. c, Calibration of C, values measured by single-cell qPCR to number 

of molecules. Average detected transcript levels (log,Ex = C,,op — Cp 

CyLop = 22) for six ERCC RNA spike-ins as a function of provided number of 
molecules per lysis reaction for each of three independent single-cell qPCR 
experiments performed on embryonic (E16.5, two replicates; red and green) 
and adult mouse lung (adult AT2, one replicate; blue). Linear regression 

fits through data points and corresponding equations are shown and were 
used to convert C, values measured by qPCR into numbers of transcripts. 

d, Single-cell qPCR confirms the presence of a higher number of transcripts in 
lung epithelial progenitor cells in comparison with fully differentiated alveolar 
epithelial cells. The median number of transcripts per cell as detected by 
single-cell RNA-seq (y axis) and by single-cell multiplexed qPCR of 90 genes 
(x axis) is shown for distal lung epithelial cells at E16.5 (qPCR, 33 cells; 
RNA-seq, 27 cells) and mature AT2 cells (qPCR, 48 cells; RNA-seq, 46 cells). 
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Extended Data Figure 8 | Transcriptional states during the early lifetime of | canonical Clara cell marker Scgb1a1 is first detected at E18.5, Scgb3a2 is 

the Clara cell lineage identified by single-cell RNA-seq of Scgb3a2" cells detected as early as E14.5, suggesting that it is an early Clara cell marker. b, Gene 
at E14.5, E16.5 and E18.5. a, Hierarchical clustering of 24 Scgb3a2-positive | Ontology (GO) enrichments of the three different gene clusters as well as 
cells from distal mouse lung epithelium at different embryonic time points transcription factors (TFs) belonging to the different groups of genes. c, PCA 
(E14.5, E16.5 and E18.5) based on the genes with highest principal-component —_ analysis of all Scgb3a2-positive cells and all genes identifies three different 
loadings in an unbiased PCA analysis of all cells and all genes (shown inc).Cells cell populations that were identified as bronchiolar progenitor as well as Clara 
are shown in rows, genes in columns. Cells cluster into three major groups. and ciliated cells. 

Scgb3a2 and Scgb1a1 transcript levels are shown in bars on the right. Whereas 
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Disruption of Mediator rescues the stunted growth of 
a lignin-deficient Arabidopsis mutant 


Nicholas D. Bonawitz'+, Jeong Im Kim!, Yuki Tobimatsu?, Peter N. Ciesielski*, Nickolas A. Anderson!, Eduardo Ximenes*, 
Junko Maeda't, John Ralph”®, Bryon S. Donohoe’, Michael Ladisch*’ & Clint Chapplet 


Lignin is a phenylpropanoid-derived heteropolymer important for 
the strength and rigidity of the plant secondary cell wall'”. Genetic 
disruption of lignin biosynthesis has been proposed as a means to 
improve forage and bioenergy crops, but frequently results in stunted 
growth and developmental abnormalities, the mechanisms of which 
are poorly understood’. Here we show that the phenotype of a lignin- 
deficient Arabidopsis mutant is dependent on the transcriptional 
co-regulatory complex, Mediator. Disruption of the Mediator com- 
plex subunits MED5a (also known as REF4) and MED5b (also known 
as RFR1) rescues the stunted growth, lignin deficiency and widespread 
changes in gene expression seen in the phenylpropanoid pathway 
mutant ref8, without restoring the synthesis of guaiacyl and syringyl 
lignin subunits. Cell walls of rescued med5a/5b ref8 plants instead 
contain a novel lignin consisting almost exclusively of p-hydroxyphenyl 
lignin subunits, and moreover exhibit substantially facilitated poly- 
saccharide saccharification. These results demonstrate that guaiacyl 
and syringyl lignin subunits are largely dispensable for normal growth 
and development, implicate Mediator in an active transcriptional 
process responsible for dwarfing and inhibition of lignin biosyn- 
thesis, and suggest that the transcription machinery and signalling 
pathways responding to cell wall defects may be important targets 
to include in efforts to reduce biomass recalcitrance. 

The phenolic polymer lignin has a central role in both the fitness of 
plants and their utility to humans. During development, lignin is depos- 
ited along with cellulose and hemicelluloses in the secondary cell walls 
of structural fibres and water-conducting cells, where it is essential for 
their strength and rigidity’’. Lignin deposition is also induced at sites 
of wounding or pathogen attack’, impeding the action of fungal and 
bacterial cellulolytic enzymes and thus inhibiting pathogen invasion 
of surrounding tissues. Lignin limits the value of biomass crops used 
for forage or lignocellulosic biofuel production by interfering with the 
breakdown of cellulose and other wall polysaccharides to simple sugars’. 
Attempts to reduce biomass recalcitrance through genetic manipula- 
tion of lignin deposition have met with some success, but the stunted 
growth of many of the resulting plants and the associated yield penalty 
have made the use of similar genetic modifications in commercial bio- 
mass crops problematic’. The molecular events underlying the dwarf- 
ing of lignin biosynthetic mutants are at present poorly understood, 
but may involve lignin deficiency per se, loss (or gain) of a related non- 
lignin metabolite, or the activation of a cell wall integrity monitoring 
pathway analogous to that triggered by the receptor-like tyrosine kinase 
THE] in response to cellulose deficiency®. Understanding and poten- 
tially mitigating the pleiotropic phenotypes of lignin biosynthetic mutants 
could markedly alter the economic feasibility of lignocellulosic biofuel 
production. 

The lignin polymer is derived mainly from three chemically related but 
distinct hydroxycinnamy] alcohols that, after oxidation and polymerization 


in the secondary cell wall, give rise to p-hydroxyphenyl (H), guaiacyl 
(G) and syringyl (S) lignin subunits’ (Fig. 1a). Although S lignin is dis- 
pensable for normal growth’, disruption of shared phenylpropanoid 
pathway enzymes upstream of both G and S lignin leads to considerable 
developmental abnormalities” '’. For instance, the Arabidopsis mutant 
reduced epidermal fluorescence 8-1 (ref8-1) is severely dwarfed and sterile 
due to a missense mutation in the gene encoding p-coumaroylshikimate 
3’ -hydroxylase (C3’H) that reduces, but does not eliminate its activity”'?"*. 
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Figure 1 | Disruption of MED5a/5b rescues growth and fertility of 
Arabidopsis ref8 mutants. a, Simplified view of the phenylpropanoid 
pathway, showing the position of REF8/C3’H in relation to the pathway 
branches leading to H, G and S lignin subunits and sinapoylmalate. 

b, Three-week-old wild-type, ref8-1, ref8-2, med5a/5b, med5a/5b ref8-1 and 
med5a/5b ref8-2 plants photographed under ultraviolet light. Disruption of 
MED5a/5b does not restore normal blue epidermal fluorescence in either ref8-1 
or ref8-2 mutant plants. c, Plants with the same genotypes as b at 4.5 weeks after 
planting, after plants have flowered and begun to set seed. 
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Plants containing an inactivating insertional mutation in the REF8 
gene halt development after the production of only two or three extre- 
mely small leaves’>’*. Despite retaining all of the phenylpropanoid 
pathway enzymes required for the synthesis of H lignin subunits, ref8 
mutants synthesize substantially reduced levels of total lignin’®"’, sug- 
gesting that lignin biosynthesis may be actively repressed in response to 
reductions in C3’H activity. 

The Mediator complex subunits MED5a and MED5Sb have recently 
been shown to be required for homeostatic repression of phenylpro- 
panoid biosynthesis in Arabidopsis'’. Mediator is a large, multisubunit 
transcriptional co-regulator that is increasingly recognized as an essential 
element of both basal and regulated eukaryotic transcription’*. Whereas 
some subunits of Mediator are required globally for transcription, other 
subunits are dispensable and appear to be required only for specific tran- 
scriptionally regulated processes. In plants, these processes include path- 
ogen response”, freezing tolerance”, transition to reproductive growth” 
and phytochrome signalling”. 

To test whether MED5a/5b could be involved in the repression of 
lignin biosynthesis in ref8-1 mutants, we generated med5a/5b ref8-1 triple 
mutants. Although ref8-1 mutants contain only 40% of the amount of 
lignin found in wild-type plants, lignin deposition in med5a/5b ref8-1 
triple mutants is restored to wild-type levels (Table 1). In addition, dis- 
ruption of MEDS5a/5b in the ref8-1 mutant background led to almost 
complete rescue of the ref8-1 growth phenotype, with the morphology 
of med5a/5b ref8- 1 plants virtually indistinguishable from that of med5a/ 
5b plants at multiple stages of growth (Fig. 1b, c). Unlike ref8-1 mutant 
plants, which produce stunted inflorescence stems and are uniformly 
sterile, med5a/5b ref8-1 plants flower normally and produce robust inflo- 
rescence stems and siliques containing fertile seeds (Fig. 1c). Disruption 
of MED5a/5b also ameliorates the developmental arrest phenotype of 
the null insertional mutant ref8-2. Notably, the rescue of med5a/5b ref8-2 
plants is less complete than that of med5a/5b ref8-1 plants, suggesting 
that the residual C3’H activity of the latter may be important for their 
relatively normal growth. On the basis of these results, we conclude that 
MED5a/5b, and by extension Mediator, has an active role in generat- 
ing both the lignin deficiency and the dwarf phenotype of Arabidopsis 
ref8 mutants. 

Observation of med5a/5b ref8-1 and med5a/5b ref8- 1 mutants under 
ultraviolet light revealed that they retain the reduced epidermal fluores- 
cence phenotype associated with decreased sinapoylmalate accumulation’** 
(Fig. 1b), suggesting that the rescue of their stunted growth does not 
arise from suppression of their underlying metabolic defect. This con- 
clusion is supported by metabolite analysis of med5a/5b ref8-1 and med5a/ 
5b ref8-2 mutants, both of which show substantially lower levels of con- 
iferyl and sinapyl alcohol-derived metabolites than wild-type plants, 
as well as hyperaccumulation of p-coumarate and p-coumaryl alcohol- 
derived compounds (Extended Data Fig. 1). The flavonoid hyperac- 
cumulation phenotype associated with loss of C3'H activity’? is sub- 
stantially relieved in med5a/5b ref8-1 and med5a/5b ref8-2 mutants 
(Extended Data Fig. 2), consistent with the notion that hyperaccumu- 
lation of these compounds is a result of stress-induced synthesis, and 
not ‘metabolic overflow’, as previously suggested'*™*. 

Microscopic and histochemical analysis of inflorescence stem cross- 
sections revealed multiple cell wall abnormalities in ref8 mutants that 
were rescued by disruption of MED5a/5b, as well as a number of signi- 
ficant differences between med5a/5b ref8-1 mutants and wild-type plants. 
For instance, med5a/5b ref8-1 cross-sections show significantly more 


Table 1 | Thioglycolic acid quantification of total lignin 


Genotype Azgonm meg fresh weight~! + s.d.* 
Wild type 0.116+0.010 
med5a/5b 0.122 + 0.007 
ref8-1 0.046 + 0.003 
med5a/5b ref8-1 0.102 + 0.011 


Aogonm, absorbance at 280 nm. 
*N=3. 
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intense staining compared with ref8-1 cross-sections when treated with 
either Maule reagent (Fig. 2a) or phloroglucinol (Fig. 2b), two commonly 
used stains for lignified tissues. The red staining of S lignin subunits by 
Maule reagent, however, is substantially less intense in med5a/5b ref8-1 
mutants than in wild-type or med5a/5b mutant plants, consistent with 
aloss of S subunits due to decreased C3’H activity. Treatment with phl- 
oroglucinol, which is thought to react primarily with lignin-associated 
hydroxycinnamaldehyde moieties”, also results in more intense stain- 
ing in med5a/5b ref8-1 plants than in ref8-1 plants. The reproducible 
difference in the colour of phloroglucinol-stained med5a/5b ref8-1 tissues 
(bright, reddish-purple) compared with wild-type and med5a/5b tissues 
(darker purple) was also observed in med5a/5b ref8-2 tissues (Extended 
Data Fig. 3) and is possibly due to differences in the reaction of phlor- 
oglucinol with p-coumaraldehyde (4-hydroxycinnamaldehyde) versus 
coniferaldehyde (4-hydroxy-3-methoxycinnamaldehyde). Vascular bun- 
dles of med5a/5b ref8-1 plants contain relatively normal, open xylem 
elements, with only a few exhibiting partial collapse, in contrast with 
those of ref8-1 plants, which uniformly exhibit severely collapsed and 
irregular xylem (Fig. 2a, b). At higher levels of magnification, using either 
confocal (Fig. 2c) or transmission electron microscopy (Fig. 2d), sec- 
ondary cell walls of ref8-1 mutants show considerable disorganization 
and are significantly thicker (Extended Data Fig. 4) than wild-type cell 
walls, possibly due to altered hydrophobicity and architecture of the 
wall in the absence of normal levels of lignin. By contrast, cell walls of 
med5a/5b ref8-1 mutants show relatively normal architecture, but are 
slightly thinner than those of wild-type plants (Fig. 2d and Extended 
Data Fig. 4). 

To obtain quantitative information on the lignin composition and 
substructure of med5a/5b ref8-1 cell walls, we used solution-state two- 
dimensional nuclear magnetic resonance spectroscopy (2D-NMR) and 
derivatization followed by reductive cleavage (DFRC) lignin analyses. 


 ref8-1- 


a 
Wild type 


med5a/5b med5a/5b ref8-1 


Figure 2 | Secondary cell wall composition and architecture of med5a/5b 
ref8-1 mutant plants is distinct from both wild-type and ref8-1 mutant 
plants. a, Maule staining, which stains S subunits red and G subunits 
orange-brown. b, Phloroglucinol staining, which is specific for 
lignin-associated hydroxycinnamaldehydes. c, Confocal microscopy of tissue 
stained with the fluorescent dye acriflavin. d, Transmission electron 
microscopy of cell wall corners between adjacent fibre cells. All images are 
derived from thin sections of inflorescence stems of wild-type, med5a/5b, ref8-1 
and med5a/5b ref8-1 plants, as indicated at the bottom of each column. 
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Examination of the aromatic regions of short-range 'H-'°C correla- 
tion spectra revealed that lignin isolated from med5a/5b ref8-1 mutants 
consists almost entirely (~95%) of H-lignin subunits, and yields NVR 
signal patterns nearly identical to those of a synthetic lignin prepared 
solely from p-coumaryl alcohol (Fig. 3a). By contrast, both wild type 
and med5a/5b mutants contain typical G-rich, coniferyl- and sinapyl- 
alcohol-derived G/S lignin, with H monomers accounting for less than 
2% of total lignin. These NMR data are consistent with lignin compo- 
sition of the same plants as determined by the DFRC method (Fig. 3b), 
which quantifies lignin monomeric units released by cleavage of aryl- 
ether-type lignin intersubunit linkages”®. The high level of H lignin in 
med5a/5b ref8-1 mutants is well beyond that previously reported in C3'H 
(and related hydroxycinnamoyl-CoA shikimate hydroxycinnamoyl- 
transferase (HCT)) knockdown plants, and importantly is not accompa- 
nied by the aberrant cell wall architecture and morphological abnormalities 
that are typically associated with loss of C3’H (Fig. 2c, d). Also revealed 
by 2D-NMR were major changes in lignin substructure arising from 
differences in the relative abundance of the various intersubunit linkages 
(Extended Data Fig. 5). Most notably, med5a/5b ref8-1 plants showed a 
substantial decrease in aryl-ether-type (B-O-4) linkages that arise from 
the addition ofa lignin monomer to the phenolic end ofa growing poly- 
mer and are the most abundant in wild-type lignin’. This decrease is 
offset by a twofold increase in phenylcoumaran-type (B-5) and a three- 
fold increase in resinol-type (Bf) linkages. These changes in turn pro- 
vide an explanation for the significant reduction in total lignin monomers 
determined by the DFRC method (Fig. 3b), as this method does not 
release lignin monomers linked to adjacent monomers via either B-5 
or B-f linkages”®. We also observed a substantial (~ 16-fold) increase 
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Figure 3 | Lignocellulosic material from med5a/5b ref8-1 plants contains 
lignin composed almost entirely of H subunits and shows substantially 
increased saccharification potential. a, 2D-NMR analysis of lignin 
composition of wild-type, med5a/5b and med5a/5b ref8-1 plants, and a 
synthetic H-only lignin (H-dehydrogenation polymer (H-DHP)). Resonance 
signals arising from H, G and S subunits are colour coded to match the 
structures shown on the right. b, Gas chromatographic quantification of H, 
Gand S lignin monomers after reductive cleavage of aryl-ether linkages by the 
DFRC method. c, Glucose yield after incubation of lignocellulosic material with 
cellulase and B-glucosidase either without pre-treatment (top) or after liquid 
hot water pre-treatment (bottom). Error bars in b and c represent the standard 
deviation (s.d.) of three biological replicates. 
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in the proportion of normally minor arylglycerol end units in the high 
H lignin of med5a/5b ref8-1 mutants. Similar changes in lignin sub- 
structures were previously observed during our NMR analyses of other 
high-H-lignin plants'*” and are probably attributable to the substantial 
differences in the polymerization characteristics of p-coumaryl alcohol 
versus the typical coniferyl and sinapyl alcohols’’. The observation of 
elevated proportions of arylglycerol end units also suggests that the lignin 
of med5a/5b ref8-1 mutants consists of a greater number of shorter poly- 
mers (that is, exhibits a lower degree of polymerization) than the lignin 
of wild-type and med5a/5b plants. Consistent with this interpretation, 
gel-permeation chromatography analyses revealed a significant under- 
representation of high-molecular-weight species in lignin of med5a/5b 
ref8-1 plants compared with wild-type and med5a/5b lignin (Extended 
Data Fig. 6). 

To test whether the altered lignin composition and structure of med5a/ 
5b ref8-1 plants leads to facilitated breakdown of cell wall polysaccharides, 
lignocellulosic material from mutant and wild-type plants was subjected 
to a mixture of commercial cellulase and B-glucosidase enzymes and 
the resulting release of glucose monitored over time (Fig. 3c). These ana- 
lyses revealed a substantial increase in glucose yield from med5a/5b ref8-1 
biomass compared with samples from wild-type or med5a/5b plants. 
Non-pre-treated med5a/5b ref8-1 samples, for instance, showed more 
than twice the glucose yield of non-pre-treated wild-type and med5a/5b 
samples after 24 h of treatment with cellulase and B-glucosidase (~70% 
versus ~30% of total glycan). Pre-treatment of med5a/5b ref8-1 mate- 
rial with liquid hot water increased the glucose yield to ~80% of total 
glycan after 24 h, compared with 50% and 40% of total glycan for wild- 
type and med5a/5b material, respectively. In the case of both pre-treated 
and non-pre-treated med5a/5b ref8-1 material, increasing the time of 
cellulase/B-glucosidase incubation did not significantly increase the total 
glucose yield. These data suggest that genetically modified, high-H-lignin 
feedstocks could potentially be used to reduce the cost of biomass pre- 
treatment and enzymatic saccharification, thus substantially improving 
the economic balance sheet of lignocellulosic biofuel production. 

To determine the possible effects on gene expression of disrupting 
MED5a/5b in the ref8-1 mutant background, we used high-throughput 
messenger RNA sequencing to identify differentially expressed genes 
among wild-type, med5a/5), ref8-1 and med5a/5b ref8-1 plants (Fig. 4 and 
Supplementary Data 1). Consistent with our previous report’”, med5a/5b 
plants exhibited elevated transcript levels of phenylpropanoid biosyn- 
thetic genes, including those encoding PAL, C4H, 4CL, C3’H, CCR, CAD 
and the recently identified”* caffeoylshikimate esterase, CSE (Extended 
Data Fig. 7). Moreover, ‘phenylpropanoid biosynthesis’ was the term 
most significantly overrepresented among ontologies of the 248 genes 
overexpressed in med5a/5b plants (Supplementary Table 1). In ref8-1 
plants, a total of 8,772 genes, representing a substantial fraction of the 
genome, were expressed at levels significantly different from wild type. 
Drought- and dehydration-responsive genes showed the most signifi- 
cant association with increased expression in ref8-1 plants (Supplemen- 
tary Table 2), consistent with the suggestion that lignin deficiency may 
lead to impaired water transport. The increased expression of flavonoid 
biosynthesis genes in ref8-1 plants (Extended Data Fig. 8) further sup- 
ports the conclusion that flavonoid hyperaccumulation in this mutant 
is due to stress-induced synthesis rather than metabolic overflow. Also 
among the genes overexpressed in ref8-1 mutants were the same phenyl- 
propanoid genes that showed increased expression in med5a/5b mutants 
(Extended Data Fig. 7), indicating that the reduced total lignin in ref8-1 
mutants is not a result of Mediator-dependent transcriptional repres- 
sion of these genes. Consistent with the reduced size of ref8-1 mutants, 
gene ontology term analysis revealed that a high proportion of the genes 
with decreased expression in these plants were related to basic processes 
of growth, cell division and primary metabolism (Supplementary Table 3). 
Remarkably, of the ~9,000 genes exhibiting increased or decreased tran- 
script levels in ref8-1 plants, nearly 90% were expressed at wild-type 
levels in med5a/5b ref8-1 plants (Fig. 4a). Of the genes exhibiting dif- 
ferential expression in both ref8-1 and med5a/5b ref8-1 plants, a substantial 
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Figure 4 | Disruption of MED5a/5b rescues the widespread transcriptional 
reprogramming of the ref8-1 mutant. a, Venn diagrams showing the number 
of genes with significantly increased (top) or decreased (bottom) expression 
in med5a/5b, ref8-1 and med5a/5b ref8-1 mutants compared with wild-type 
plants, as well as the intersection of these sets. b, Relationship between 
expression in med5a/5b ref8-1 mutants (x-axis) and ref8-1 mutants (y-axis) of 
all genes either upregulated (upper right quadrant) or downregulated (lower 
left quadrant) in both ref8-1 and med5a/5b ref8-1 plants compared with wild 
type. Each gene is represented by one point. Genes exhibiting a significant 
difference in expression level between ref8-1 and med5a/5b ref8-1 are shown in 
red. For all RNA-seq data, significance corresponds to a multiple-test-adjusted 
P value < 0.05, as determined using the DESeq algorithm. Genes for which 
there is no significant difference are shown in black. 


number are misregulated to a lesser degree in med5a/5b ref8-1 plants, 
with transcript levels much closer to those in wild-type plants (Fig. 4b). 
With the exception of MYB75 (also known as PAP1) and MYB90 (also 
known as PAP2), the expression of phenylpropanoid-controlling tran- 
scription factors is largely unaffected in the mutants. The moderate over- 
expression of MYB75 and MYB90in ref8-1 and med5a/5b ref8-1 mutants 
is consistent with their elevated levels of anthocyanins, but is unlikely 
to contribute to their dwarf phenotype, as the pap1-D mutant is not 
dwarfed, despite much greater MYB75 overexpression”’. Taken toge- 
ther, these results demonstrate that genetic disruption of C3’H activity 
results in widespread qualitative and quantitative changes in gene expres- 
sion in ref8-1 mutant plants, and that MED5a/5b are required, directly 
or indirectly, for the great majority of these changes. 

The molecular events underlying the dwarfing of lignin-deficient 
mutants are complex and poorly understood®. The experiments described 
here provide new insights into this phenomenon by demonstrating that 
MED5a/5b are required for both the stunted growth and the lignin defi- 
ciency of Arabidopsis ref8 mutants. Taken together with the established 
roles of the Mediator complex in transcriptional regulation, and of MED5a/ 
5b in the repression of phenylpropanoid metabolism, these observations 
lead us to propose the following model (Extended Data Fig. 9). MED5a/5b 
are key components of an active, transcriptional process by which phe- 
nylpropanoid homeostasis is maintained in wild-type plants. In ref8 
mutants, the metabolic block at C3'H leads to changes in one or more 
wall-bound or soluble phenylpropanoid metabolites, eliciting an inap- 
propriate or exaggerated response by this normally homeostatic pathway 
and initiating a transcriptional cascade that ultimately results in repres- 
sion of lignification and impaired growth. The failure to induce or repress 
one or more direct targets of MED5a/5b in med5a/5b ref8 plants inter- 
rupts this response, thereby restoring their ability to grow and to syn- 
thesize wild-type levels of a novel, nearly pure H lignin. 

The surprising observation that G and S lignin are largely dispens- 
able for normal growth and development in the permissive med5a/5b 
background raises important new questions for future investigation. Given 
that ref8 mutants do not exhibit transcriptional repression of pheny]- 
propanoid biosynthetic genes, and that H subunits can be polymerized 
into a functional lignin polymer, it is unclear why ref8 mutants should 
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fail to lignify normally. One possibility is that, in addition to their role 
in transcriptional regulation of phenylpropanoid biosynthesis, MED5a/5b 
may also control the expression of one or more target genes responsible 
for post-transcriptional repression of monolignol synthesis, transport 
or polymerization. If true, this would suggest that the failure to produce 
sufficient total lignin, and not the loss of G and S subunits, could be the 
underlying cause of dwarfing in ref8 mutants. Consistent with this pos- 
sibility, we have observed that disruption of MED5a/5b does not rescue 
the stunted growth of the C4H-deficient ref3-2 mutant (data not shown), 
indicating that alterations in the Mediator complex do not relieve the 
requirement for an unobstructed pathway to at least one of the canon- 
ical monolignols. Alternatively, it is possible that Mediator represses 
the growth of ref8 mutants independent of its role in repressing lignifi- 
cation, for instance by modulating the activity ofa wall integrity-sensing 
pathway similar to those mediated by THE] or WAK1. Indeed, we can- 
not formally exclude the possibility that repression of lignification in 
ref8 mutants is the result, rather than the cause, of their impaired growth. 
It will be important to identify the direct targets of MED5a/5b among the 
thousands of misregulated genes in ref8 mutants to distinguish between 
these alternative models. It is also notable that although it has been sug- 
gested that dwarfing in lignin-deficient mutants is due to hyperaccu- 
mulation of salicylic acid”, this is apparently not the case for Arabidopsis 
ref8-1 mutants; rescued med5a/5b ref8-1 mutants contain higher levels 
of salicylic acid than ref8-1 plants (Extended Data Fig. 10a), and disrup- 
tion of the salicylic acid biosynthetic enzyme isochorismate synthase does 
not alleviate the dwarfism of ref8-1 mutants (Extended Data Fig. 10b). 

Finally, our saccharification analysis of med5a/5b ref8-1 plants demon- 
strates that in the proper genetic background it is possible to create high- 
H-lignin plants with greatly reduced biomass recalcitrance, while at the 
same time avoiding the severe yield penalty normally associated with 
loss of G and S lignin. Together with previously reported evidence that 
the role of Mediator in phenylpropanoid repression may be conserved”, 
this observation suggests that similar genetic modifications could lead 
to improved biomass crops for forage and bioenergy. Further elucida- 
tion of the gene products upstream and downstream of Mediator in 
the transcriptional response to phenylpropanoid pathway defects pro- 
mises to yield further insight into this phenomenon and may reveal 
additional targets for bioengineering. 


METHODS SUMMARY 


Plants simultaneously containing mutations in MED5a, MED5b and REF8 were 
generated by crossing previously described transfer DNA (T-DNA)- and ethyl meth- 
anesulphonate (EMS)-derived mutants. PCR-based methods were used for geno- 
type determination and for the identification of mutants of interest. Cell walls of 
mutant and control plants were visualized using a variety of staining methods in 
conjunction with bright-field, confocal and transmission electron microscopy. Lignin 
composition and substructures were determined using 2D-NMRas well as DFRC, 
a gas-chromatography-based method that detects lignin monomers liberated by 
cleavage of all lignin B-O-4 bonds. Changes in soluble phenylpropanoid metabo- 
lism were monitored using high-performance liquid chromatography. Recalci- 
trance of lignocellulose was determined by treating with commercial cellulase and 
B-glucosidase, and monitoring the release of glucose over time. Gene expression 
differences between genotypes were determined by high-throughput sequencing 
of mRNA extracted from whole rosettes before the transition to flowering (that is, 
RNA-seq). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Plant material and growth conditions. Plants were grown in Redi-earth Plug and 
Seedling Mix (Sun Gro Horticulture) supplemented with Scotts Osmocote Plus con- 
trolled release fertilizer (Hummert International) at a temperature of 22 °C and 
a light intensity of 100 #Em *s_', with a 16h light/8h dark photoperiod. All 
photographic images presented are unaltered with respect to brightness, colour 
and contrast. In some cases, portions of single images have been cropped or rear- 
ranged for neatness and clarity of presentation. 

The Arabidopsis ref8-1 and ref3-2 mutants were isolated from a forward genetic 
screen for plants deficient in the synthesis of sinapoylmalate and have been described 
previously'®'!'*!'>, The sid2-4 mutant (The Arabidopsis Information Resource (TAIR) 
accession number SALK_133146) contains a transfer DNA (T-DNA) insertion in 
the second intron of the gene encoding the salicylic acid biosynthetic enzyme iso- 
chorismate synthase and was obtained from the Arabidopsis Biological Research 
Center (ABRC; The Ohio State University). Primers CC4042 (left primer, 5’-TCT 
GGGCTCAAACACTAAAACAC-3’), CC4043 (right primer, 5'-GAATCAGAG 
GTGACGTTGAAGAC-3’) and CC2449 (border primer, 5’-ATTTTGCCGATT 
TCGGAAC-3') were used to genotype plants for the sid2-4 insertion. To generate 
the ref8-1 sid2-4 mutant, we began by crossing a sid2-4 homozygous plant to a 
confirmed REF8-1/ref8-1 heterozygote. Plants heterozygous for the ref8-1 muta- 
tion among the F1 progeny of this cross were identified as previously described”, 
and allowed to self-fertilize. In the resulting population, we identified plants with 
the genotype REF8/ref8-1 sid2-4/sid2-4, which give rise to progeny with the geno- 
types REF8/REF8 sid2-4/sid2-4, REF8/ref8-1 sid2-4/sid2-4, and ref8-1/ref8-1 sid2- 
4/sid2-4 at a ratio of 1:2:1. The ref8-2 (TAIR accession SALK_036132) mutant, as 
well as the med5a (TAIR accession SALK_037472) and med5b (TAIR accession 
SALK_011621) mutants used to generate the med5a/5b double mutant were orig- 
inally obtained from the ABRC and have also been previously described’”"**'. To 
generate the med5a/5b ref8-1 triple mutant, we began by crossing a med5a/5b mutant 
plant to a confirmed REF8/ref8-1 heterozygote. Plants heterozygous for the ref8-1 
mutation among the F1 progeny of this cross were identified and allowed to self- 
fertilize as described earlier. We then used high-performance liquid chromato- 
graphy (HPLC) to screen F2 progeny of this self-fertilization for plants exhibiting 
the characteristic metabolite profile of med5a/5b homozygous double mutants. 
These plants were then genotyped using PCR as previously described" to identify 
plants with the genotype med5a/med5a med5b/med5b REF8/ref8-1. These plants 
were again allowed to self-fertilize, giving rise to med5a/med5a med5b/med5b 
REF8/REF8, med5a/med5a med5b/med5b REF8/ref8-1, and med5a/med5a med5b/ 
med5b ref8-1/ref8-1 plants in the F3 generation at a ratio of 1:2:1. An identical 
approach was used to generate med5a/5b ref8-2 and med5a/5b ref3-2 plants. To 
genotype for the presence of the ref3-2 allele, the primer pair CC2396 (5'-TTCCG 
TATCATGTTCGATAG-3’) and CC2397 (5'- AATGTCAATTTCCCAAAATC-3’) 
was used in combination with HinFI digestion, exploiting the cleaved amplified 
polymorphic sequence marker resulting from the point mutation in ref3-2. Geno- 
typing for the ref8-2 allele has been described previously’. The wild type and all 
mutants are in the Columbia-0 (Col-0) background. 

Histochemical staining. For histochemical staining, basal sections of primary inflo- 
rescence stems were embedded in 5% agar and cut into 100 1m sections using a 
vibratome. For phloroglucinol staining, sections were then incubated in 16% eth- 
anol (v/v), 10% HCl (v/v), 0.2% phloroglucinol (w/v) for 10 min, then washed, 
mounted in water, and examined using an Olympus Vanox-S light microscope. 
For Maule staining, sections were fixed in 4% glutaraldehyde, then washed with 
water, incubated in 0.5% KMnO, (w/v) for 5 min and washed twice with water. 
Finally, sections were mounted in concentrated NH4,OH and examined under the 
light microscope. 

Confocal laser scanning microscopy. Semi-thin sectioned samples were posi- 
tioned on glass microscope slides and stained with 0.1% acriflavin. Images were 
captured using a Nikon C1 Plus microscope, equipped with the Nikon C1 confocal 
system with four lasers (403 nm, 561 nm, 643 nm and Argon tuneable 458/477/488/ 
515 nm), and operated via Nikon’s EZ-C1 software. 

Cell wall thickness measurements. Cell wall thickness was measured directly from 
the confocal laser scanning microscopy (CLSM) images using tools within the Fiji 
(fiji.sc) image processing package. The high contrast of the acriflavin-stained 300- 
nm-thick sections imaged by CLSM allowed images of a group of similar cells such 
as vascular bundle fibre cells to be isolated as a region of interest (ROI), thresholded, 
and converted to binary easily and accurately (Extended Data Fig. 4). From the 
binary image, two operations were performed to derive a description of cell wall 
thickness. First, skeletonization provides a set of coordinates of the midpoint of 
the cell walls. Second, a distance map provides the Euclidian distance of each point 
within the cell wall to the nearest non-cell-wall space. By combining these two, the 
distance measurement is reported for each point along the cell wall. 
Transmission electron microscopy. Stem sections were high-pressure frozen with a 
Leica EMPact2 high-pressure freezer in 0.2 mm deep planchets (Leica Microsystems) 
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using 0.15 M sucrose as a cryoprotectant. Next, freeze substitution was carried out 
in a Leica AFS2 automated freeze substitution unit in 1% OsO, (EMS). The sam- 
ples were dehydrated by solution exchange with increasing concentrations of ace- 
tone. After dehydration, the samples were infiltrated with Eponate 812 (EMS) by 
incubating at room temperature for several hours to overnight in increasing con- 
centrations of resin diluted in acetone. The samples were transferred to capsules 
and the resin polymerized in a 60 °C oven overnight. Resin-embedded samples 
were sectioned to ~50 nm with a Diatome diamond knife on a Leica EM UTC 
ultramicrotome (Leica Microsystems). Sections were collected on 0.5% Formvar- 
coated slot grids (SPI Supplies) and were post-stained for 1 min with 1% aqueous 
KMn0O,. Images were taken with a 4 mega-pixel Gatan UltraScan 1000 camera on 
a FEI Tecnai G2 20 Twin 200 kV LaB6 transmission electron microscope. 

Plant cell wall preparation. Extractive-free plant cell walls for DFRC, NMR, and 
saccharification analyses were prepared essentially as previously described. Briefly, 
mature, dried Arabidopsis inflorescence stems were stripped of all cauline leaves 
and siliques, cut into small pieces, and finely ground in liquid nitrogen. Ground 
tissue was then serially extracted in ten volumes of 0.1 M sodium phosphate buffer 
(pH 7.2) at 50 °C for 1 h, six extractions in ten volumes of 70% ethanol (v/v) at 70 °C 
for 15 min, and a final extraction in 100% acetone for 10 min. Cell wall residue was 
then collected by centrifugation and dried overnight in a vacuum oven at 50°C. 
Thioglycolic acid quantification of lignin. For thioglycolic acid quantification of 
lignin, approximately 100 mg of fresh inflorescence stem material was flash-frozen 
in liquid nitrogen and ground in a microcentrifuge tube with a small pestle. Ground 
tissue was then extracted in ten volumes of 100% methanol at 80 °C for 2h, col- 
lected by centrifugation, washed with ten volumes of distilled water, and again 
collected by centrifugation. Samples were then resuspended in 750 ul of distilled 
water, 250 pil concentrated HCl and 100 il thioglycolic acid and incubated for 3 h 
at 80°C. Samples were then collected by centrifugation, washed with 1 ml of dis- 
tilled water, resuspended in 1 ml of 1 M NaOH, and incubated with gentle shaking 
for 12h at room temperature. After spinning for 10 min at maximum speed in a 
microcentrifuge, the supernatant was transferred to a new tube, to which was added 
200 pl of concentrated HCl. Samples were then vortexed and incubated at 4 °C for 
4h. The precipitate of this reaction was then collected by spinning for 10 min at 
maximum speed in a microcentrifuge, dissolved in 1 ml of 1M NaOH, and the 
absorbance at 280 nm determined using a spectrophotometer. 

DFRC. DFERC lignin analysis was performed essentially as previously reported’*. 
Briefly, cell wall samples were dissolved in acetyl bromide/acetic acid solution, con- 
taining 4,4’-ethylidenebisphenol as an internal standard. The reaction products 
were dried down using nitrogen gas, dissolved in dioxane/acetic acid/water (5/4/1, 
v/v/v), reacted with Zn dust, purified with C-18 SPE columns (Supelco), and 
acetylated with pyridine/acetic anhydride (2/3, v/v). The lignin derivatives were 
analysed by gas chromatography/flame ionization detection using response factors 
relative to the internal standard of 1.26 for p-coumaryl alcohol peracetate, 1.30 for 
coniferyl alcohol peracetate and 1.44 for sinapyl alcohol peracetate. The same sam- 
ples were run through gas chromatography—mass spectrometry in parallel to con- 
firm the identity of the derived hydroxycinnamy] alcohol peracetates. 
Preparation of cellulolytic enzyme lignins from Arabidopsis stem cell walls. 
Preparation of cellulolytic enzyme lignin (CEL) samples from Arabidopsis stem 
cell walls for NMR and gel-permeation chromatography (GPC) was as described 
previously’*™. In brief, preground cell wall samples were extracted with 80% aque- 
ous ethanol (sonication 3 X 20 min). Isolated cell walls (~300 mg) were ball milled 
(7 X 5 min milling and 5 min cooling cycles) using a Retsch PM100 ball mill vibrat- 
ing at 600 r.p.m. with ZrO, vessels containing ZrO, ball bearings. The ball-milled 
walls (~250 mg) were transferred to centrifuge tubes and digested at 30 °C with 
crude cellulases (Calbiochem Cellulysin; lot no. D00074989; 30 mg g! of sample 
in pH 5.0 acetate buffer; three times over 2 days, fresh buffer and enzyme added 
each time), leaving all of the lignin and residual polysaccharides (21%, 24% and 
20% isolation yields for the original cell walls of wild-type, med5a/5b and med5a/5b 
ref8-1 plants). For NMR and GPC analysis, the cellulase-digested cell walls (~35 mg) 
were subjected to solubilization and acetylation in DMSO/N-methylimidazole/ 
acetic anhydride* to afford acetylated CELs (137%, 142% and 141% weight yields 
for CELs from wild-type, med5a/5b and med5a/5b ref8-1 plants). 

Preparation of in vitro synthetic lignin polymer from p-coumaryl alcohol. The 
p-coumaryl alcohol precursor was synthesized as previously described**. H-DHP 
was generated via horseradish peroxidase (HRP)-catalysed polymerization using a 
typical end-wise polymerization method”: 240 ml of acetone/sodium phosphate 
buffer (0.1 M, pH 6.5) (1:9, vol/vol) containing p-coumaryl alcohol (1 mmol), and 
a separate solution of hydrogen peroxide (1.2 mmol) in 240 ml of water were added 
by peristaltic pump over a 20h period at 25 °C to 60 ml of buffer containing HRP 
(Sigma-Aldrich; type VI, 250-330 U, 5 mg). The reaction mixture was further stirred 
for 4h and then acidified to pH ~3 with 1M aqueous HCl. The precipitate was 
collected by centrifugation (10,000g, 15 min), washed with ultrapure water (100 ml 
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X 3), and lyophilized (weight yield 91%). Acetylation of DHPs (~40 mg) used 1:1 
acetic anhydride and pyridine (weight yield, 112%). 
NMR spectroscopy. NMR spectra were acquired on a Bruker Biospin AVANCE 
500 MHz spectrometer fitted with a cryogenically cooled 5 mm TCI gradient probe 
with inverse geometry ('H coils closest to the sample). Acetylated samples of Arabidopsis 
CELs (~50 mg) or H-DHP (~40 mg) were dissolved in 0.5 ml of chloroform-d; 
the central chloroform solvent peak was used as an internal reference (dc, 77.0; Oy, 
7.26 p.p.m.). HSQC experiments using Bruker’s adiabatic pulse version of the exper- 
iment (hsqcetgpsisp.2) were carried out using the following parameters**: acquired 
from 10 to 0 p.p.m. in F2 ("H) with 1,998 data points (acquisition time 200 ms), 200 
to 0 p.p.m. in Fl ('°C) with 400 increments (F1 acquisition time 8 ms) of 96 scans 
with a 1.0s interscan delay; the d24 delay was set to 0.89 ms (1/8J, J: 140 Hz). Pro- 
cessing used typical matched Gaussian apodization in F2 and squared cosine-bell 
apodization and one level of linear prediction (32 coefficients) in Fl. Volume inte- 
gration of contours in HSQC plots used Bruker’s TopSpin 3.1 software and no 
correction factors were used; that is, the data represent volume integrals only. For 
quantification of H/G/S distributions, only the C,-H) correlations from G units 
and the C,—H2/C,—-Hg correlations from H and S units were used, and the G inte- 
grals were logically doubled. For rough estimation of the various interunit linkage 
types, the following well-resolved contours were integrated: A,, B,, C,, D,, D', 
E,, X1, and X2g; X1 and X2 are not included in the total, which reflects just the 
interunit linkages; their percentages in Extended Data Fig. 5 are expressed as a per- 
centage of total interunits (A-E). 
HPLC analysis of soluble metabolites. To extract and measure hydroxycinna- 
mate esters and flavonols, 3-week-old rosette leaves were harvested and extracted 
in 50% methanol (v/v) for 1h at 65 °C at a tissue concentration of 100 mg ml |. 
Insoluble material was sedimented by spinning for 5 min at 14,000 r.p.m. ina micro- 
centrifuge. Ten microlitres of the supernatant was then loaded onto a Shim-pack 
XR-ODS column (Shimadzu; column dimensions 3.0 mm X 75 mm, bead size 2.2 [um) 
and eluted at a flow rate of 0.7 ml min’ at an increasing concentration of aceto- 
nitrile from 10% to 35% over 8.6 min in 0.1% aqueous formic acid. Sinapic acid, 
p-coumaric acid and kaempferol were used as standards for quantification of sina- 
pate esters, p-coumarate esters and flavonols, respectively. 
Cell wall saccharification assay. Pre-treatment was carried out by pressure cook- 
ing 50 mg samples in a metal tube containing 1.5 ml water at 200 °C (30s of heat- 
up time followed by a 10 min hold). Each tube was placed in a fluidized sand bath 
(Tecam SBL-1; Cole-Parmer). The pressure within the tubes was held at the satu- 
ration vapour pressure of water to keep the water in a liquid state. The samples 
were cooled before the addition of 1.5 ml of 100 mM citrate buffer (pH 4.8), bring- 
ing the final volume to 3 ml (~2% solids (w/v)). The enzyme hydrolysis for all the 
conditions tested was based on initial solids loading and glucan concentration. Com- 
mercial cellulase (Spezyme CP) at 50 filter paper unit (FPU) g glucan (90 mg 
protein per g glucan) and B-glucosidase (Novozyme 188) at 105 cellobiohydrolase 
unit (CBU) g ' glucan (34 mg protein per g glucan) were added, and hydrolysis 
was carried out for different lengths of time at 50 °C and pH 4.8 in an incubator 
shaker (New Brunswick Scientific). The ratio of enzyme to solids was equivalent to 
10 FPUg ‘total solids, 21 CBU g’ ' total solids and 25 mg protein per g total solids. 
Enzyme hydrolysis of 50 mg untreated samples (also at ~2% solids (w/v), 50°C 
and pH 4.8) was carried out under similar experimental conditions. 
GPC. Acetylated CELs isolated from Arabidopsis stem cell walls were dissolved in 
dimethylformamide containing 0.1 M lithium bromide (~0.5 mg ml‘), and sub- 
jected to GPC analysis on a Shimadzu LC-20A LC system equipped with SPD- 
M20A photodiode array (PDA) detector using the following conditions**: column, 
Tosoh TSK gel «-M plus «-2500; eluent, dimethylformamide with 0.1 M lithium 
bromide; flow rate, 0.5 ml min” !; column oven temperature, 40 °C; sample detec- 
tion, PDA response at 280 nm. The molecular weight calibration was via polystyrene 
standards. Data acquisition and computation was done using Shimadzu LCsolution 
v.1.25 software. 
High-throughput mRNA sequencing. For global transcript analysis, we harvested 
total RNA from wild-type, med5a/5b, ref8-1 and med5a/5b ref8-1 plants. Three sam- 
ples were collected for each genotype, with each sample consisting of three whole 
rosettes for wild-type, med5a/5b and med5a/5b ref8-1 plants, and five whole rosettes 
for ref8-1 plants. All plants ofa given sample were grown in parallel in the same pot. 
Samples were harvested and flash frozen in liquid nitrogen on day 19 after planting, 
6.5h after subjective dawn, over a period of approximately 10 min. Frozen plant 
tissue was then ground in liquid nitrogen using a mortar and pestle, and total RNA 
was extracted using the Qiagen RNeasy Plant Mini Kit (Qiagen Sciences) accord- 
ing to the manufacturer’s instructions. RNA samples were treated with DNase and 
purified using the Zymo Research DNA-Free RNA Kit (Zymo Research) according 
to the manufacturer’s instructions. 

cDNA libraries for sequencing were prepared using the Illumina TruSeq Kit 
(Illumina) according to the manufacturer’s instructions. Libraries were then titrated 
using a KAPA quantification kit (Kapa Biosystems), denatured and clustered on 


both lanes of an Illumina Rapid Chemistry flow cell at 12 pM. Seventy-six bases of 
sequence were collected from each of two paired-end reads, as well as an index 
read to assign each pair of reads to its sample of origin. 

To pre-process filtered Illumina reads for mapping, sequences were first assessed 
for quality using FastQC (v.0.10.1; http://www.bioinformatics.babraham.ac.uk), 
then trimmed using the FASTX toolkit (v.0.0.13.2; http://hannonlab.cshl.edu) to 
remove bases with a Phred33 score less than 30. Trimmed reads less than 40 bases 
in length were discarded. Quality-trimmed reads were then mapped to the bowtie2- 
indexed Arabidopsis genome using Tophat (v.2.0.9; http://tophat.cbcb.umd.edu/) 
with default parameters. The total number of reads from all 12 samples that unam- 
biguously mapped to a gene feature was 309,544,562 (average 25,795,380; min- 
imum 17,571,930; maximum 30,794,071). A counts matrix consisting of the raw 
read count for each gene feature in each sample was generated using HTSeq (v.0.5.3p7; 
http://www-huber.embl.de/users/anders/HTSeq/). To eliminate fold changes of 
infinity and divide-by-zero errors, the counts matrix was modified such that genes 
with 0 counts across all samples were removed, and remaining 0 count values were 
changed to 1. 

Differential expression analysis was performed using the statistical program R 

(v.3.0.1; http://www.r-project.org/) in conjunction with three analytical methods avail- 
able from Bioconductor (http://www.bioconductor.org): DESeq (v.1.12.1), edgeR 
(v.3.2.4) and voom{limma} (v.3.16.6), as well as the Cufflinks (v.2.1.1) suite of pro- 
grams (http://cufflinks.cbcb.umd.edu/). All four statistical methods gave similar 
overall conclusions. We selected the most conservative results (DESeq; false dis- 
covery rate < 0.05) for further investigation and reporting. Venn diagrams were 
adapted from those created with the online tool Venny (http://bioinfogp.cnb.csic. 
es/tools/venny/) and gene ontology term analysis was performed using the online 
tool DAVID (v.6.7; http://david.abcc.ncifcrf.gov/home.jsp). 
Quantification of salicylic acid. To extract salicylic acid from wild-type, med5a/ 
5b, ref8-1 and med5a/5b ref8-1 plants, triplicate samples of 4-week-old, soil-grown 
plants were harvested, weighed and flash frozen in liquid nitrogen. Each sample 
consisted of a sufficient number of whole rosettes such that the total mass of the 
sample was at least 300 mg. Samples were ground in liquid nitrogen with a mortar 
and pestle and extracted at 4 °C for 24h with mild shaking at a tissue concentration 
of 50 mg ml! in ice-cold methanol containing 500 1M 3,4,5-trimethoxy-trans- 
cinnamic acid as an internal standard. Water and chloroform were then added such 
that the ratio of methanol:water:chloroform was 2:1.2:1. Samples were then briefly 
vortexed and placed at 4 °C for an additional 12 h, after which the upper phase was 
transferred to a new tube, dried in a centrifugal evaporator (speedvac) and redis- 
solved in 100 pl of 65% methanol (v/v) per 300 mg of tissue originally used for 
extraction. 

Quantification of SA was accomplished by HPLC-MS/MS on an Agilent 1200 
system using a Waters Xterra MS C18 column (5 1m, 150 mm X 2.1 mm inner dia- 
meter). A binary mobile phase consisting of 50 mM aqueous ammonium acetate 
pH7.0 (solvent A) and acetonitrile (solvent B) was used at a flow rate of 0.3 ml min :. 
Initial conditions were set at 90:10 A:B with a linear gradient to 80:20 from 0 to 
3 min, followed by a linear gradient to 30:70 from 3 to 13 min, followed by column 
re-equilibration. After separation, the column effluent was introduced by negative 
mode electrospray ionization (ESI) into an Agilent 6460 triple quadrupole mass 
spectrometer. ESI capillary voltage was —3.2 kV, nebulizer gas pressure was set at 
35 psi, drying gas temperature was 325 °C, flow rate was 91 min — e fragmentor vol- 
tage was set to 85 V. Multiple reaction monitoring transitions were 137.0 to 93.0 for 
SA and 237.2 to 103.1 for the internal standard. Collision energies used were 14 eV 
for SA and 10 eV for the internal standard. Mass data were collected and analysed 
using Agilent MassHunter software (v.B.02). SA quantification was accomplished 
using a standard curve over the range 0.01-50 pg ml’. 
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Extended Data Figure 1 | Disruption of MED5a/5b does not restore wild-type levels of the sinapate esters sinapoylglucose and sinapoylmalate, 
normal hydroxycinnamate ester biosynthesis to ref8 mutants. and instead hyperaccumulate the p-coumarate esters p-coumaroylglucose, 
Quantification of the hydroxycinnamate esters sinapoylglucose, p-coumaroylmalate and p-coumaroylshikimate. In each case, data are derived 
sinapoylmalate, p-coumaroylglucose, p-coumaroylmalate and from five individual plants, except for ref8-2, where owing to their small size, 
p-coumaroylshikimate in rosettes of 3-week-old plants of the indicated three groups of five, seven and seven plants each were combined into 
genotypes. Similar to ref8 mutants, med5a/5b ref8 plants fail to accumulate independent pools. Error bars indicate s.d. in all cases. 
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Extended Data Figure 2 | Disruption of MED5a/5b alleviates the flavonoid —_ independent pools. Error bars indicate s.d. K-(Rha-Glu)-Rha, kaempferol 
hyperaccumulation of ref8 mutants. Quantification of the three major 3-O-[6''-O-(rhamnosyl) glucoside] 7-O-rhamnoside; K-Glu-Rha, kaempferol 


flavonol glucosides in rosettes of 3-week-old plants of the indicated genotypes. 3-O-glucoside 7-O-rhamnoside; K-Rha-Rha, kaempferol 3-O-rhamnoside 
In each case, data are derived from five single plants, except for ref8-2, where _7-O-rhamnoside. 
three groups of five, seven and seven plants each were combined into 
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Extended Data Figure 3 | med5a/5b ref8-2 mutants show patterns of somewhat thinner than those of med5a/5b ref8-1 mutants and show some 
lignification similar to med5a/5b ref8-1 mutants. Shown are thin sections of | morphological abnormalities, the overall staining patterns of med5a/5b ref8-1 
inflorescence stems of med5a/5b ref8-2 mutants stained with Maule reagent and med5a/5b ref8-2 inflorescence stems are highly similar. The corresponding 
(left) and phloroglucinol (right). Plants were grown and stained in parallel with __ tissues of ref8-2 mutant plants could not be examined owing to their 

those shown in Fig. 2a, b. Although med5a/5b ref8-2 mutant stems are developmental arrest shortly after germination. 
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Extended Data Figure 4 | ref8-1 mutants show thickening of the secondary _of cell wall thickness. N > 200 cells, with at least 100 measurements per cell for 
cell wall that is rescued by disruption of MED5a/5b. Left, distance map of each sample. Error bars represent s.d. ***P < 0.001, difference from wild type 
cell wall thickness calculated from micrographs of representative samples of  (Student’s t-test). 

stem cross-sections of plants of the indicated genotypes. Right, quantification 
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Extended Data Figure 5 | Lignin of med5a/5b ref8-1 mutant plants differs from a different region of the same spectra shown in Fig. 3. Colour-coded 
structurally from lignin of wild-type or med5a/5b mutant plants. 2D-NMR _ structures on right correspond to the major resonances in each spectrum. 
spectra of lignin from the indicated genotypes. The data shown are derived 
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Extended Data Figure 6 | High-molecular-weight lignin polymers are 
underrepresented in med5a/5b ref8-1 mutants. Shown are the results of 
gel-permeation chromatography of lignin from wild-type, med5a/5b and 
med5a/5b ref8-1 cell walls. The x-axis indicates the apparent molecular weight 
of individual lignin polymer fragments and is shown as a log scale. The y-axis 
shows the response of an ultraviolet-light detector normalized to the most 
abundant signal in each chromatogram. The most abundant signal in all 
samples corresponds to a molecular weight of ~ 10,000 Da, whereas a 
secondary peak at ~250,000 Da is significantly underrepresented in lignin 
derived from the med5a/5b ref8-1 mutant. 
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Extended Data Figure 7 | Expression of lignin biosynthesis genes in the number of reads unambiguously mapping to each gene, normalized for 
wild-type, med5a/5b, ref8-1 and med5a/5b ref8-1 plants. Shown is the differences in the total number of reads between samples and for lane effects. 
expression of general phenylpropanoid and lignin biosynthesis genes in *P < 0.05, difference from wild type, as determined by the DESeq algorithm 


3-week-old rosettes of plants of the indicated genotypes as determined using —_ using a Benjamini-Hochberg procedure to adjust for multiple testing. 
high-throughput sequencing of mRNA. The value shown on the y-axis refers to 
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Extended Data Figure 8 | Expression of flavonoid biosynthesis genes in unambiguously mapping to each gene, normalized for differences in the 
wild-type, med5a/5b, ref8-1 and med5a/5b ref8-1 plants. Shown is the total number of reads between samples and for lane effects. *P < 0.05, 


expression of flavonoid biosynthesis genes in 3-week-old rosettes of plants of difference from wild type, as determined by the DESeq algorithm using a 
the indicated genotypes as determined using high-throughput sequencing of | Benjamini-Hochberg procedure to adjust for multiple testing. 
mRNA. The value shown on the y-axis refers to the number of reads 
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Extended Data Figure 9 | A model for Mediator-dependent growth 
inhibition in Arabidopsis ref8 mutants. Mutation or disruption of REF8 leads 
to direct alterations in the composition of the cell wall and other metabolic 
changes due to the loss of C3’H activity. Information on these changes is 
relayed to the nucleus by an at present unknown signalling pathway or sensor, 
resulting in massive changes in gene expression (represented by green and red 
transcripts in the model). Some of these transcriptional changes are directly 
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dependent on MEDS (centre, illustrated as a direct MED5-transcription-factor 
interaction), whereas others are independent of MEDS (left) or are indirectly 
affected by MED5 (right), such as genes controlled by transcription factors 
that are themselves MED5-dependent targets. Ultimately, changes in the 
transcription of direct and/or indirect targets of MED5 result in inhibition of 
growth, sterility and indirect effects on cell wall architecture, all of which can be 
rescued by disruption of MED5. 
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Extended Data Figure 10 | med5a/5b ref8-1 mutants show elevated levels 
of salicylic acid and disruption of SID2 does not rescue the stunted 
growth of the ref8-1 mutant. a, Shown is the quantification of salicylic acid in 
3-week-old rosettes of plants of the indicated genotypes. Data for each genotype 
are derived from three independently pooled samples representing 300 mg 

of whole rosette tissue each. Error bars represent s.d. **P < 0.01, difference 
from wild type (Student'’s t-test). b, Shown are 3-week-old rosettes of 
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representative plants of the indicated genotypes. The SID2 gene encodes the 
salicylic acid biosynthetic enzyme isochorismate synthase. The sid2-4 and 
ref8-1 sid2-4 plants shown are representative progeny of a single plant with the 
genotype sid2-4/sid2-4 REF8-1/ref8-1 that gave rise to both morphologically 
normal and dwarfed plants at a ratio of 3:1. N = 167 morphologically normal, 
50 dwarfed; 7” = 0.444, P = 0.502. 
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Hepatitis C virus (HCV) is a significant public health concern with 
approximately 160 million people infected worldwide’. HCV infec- 
tion often results in chronic hepatitis, liver cirrhosis and hepato- 
cellular carcinoma. No vaccine is available and current therapies 
are effective against some, but not all, genotypes. HCV is an enveloped 
virus with two surface glycoproteins (El and E2). E2 binds to the host 
cell through interactions with scavenger receptor class B type I (SR-BI) 
and CD81, and serves as a target for neutralizing antibodies” *. Little 
is known about the molecular mechanism that mediates cell entry 
and membrane fusion, although E2 is predicted to be a class II viral 
fusion protein. Here we describe the structure of the E2 core domain 
in complex with an antigen-binding fragment (Fab) at 2.4 A resolu- 
tion. The E2 core has a compact, globular domain structure, consist- 
ing mostly of B-strands and random coil with two small a-helices. 
The strands are arranged in two, perpendicular sheets (A and B), 
which are held together by an extensive hydrophobic core and disulph- 
ide bonds. Sheet A has an IgG-like fold that is commonly found in 
viral and cellular proteins, whereas sheet B represents a novel fold. 
Solution-based studies demonstrate that the full-length E2 ectodo- 
main has a similar globular architecture and does not undergo sig- 
nificant conformational or oligomeric rearrangements on exposure 
to low pH. Thus, the IgG-like fold is the only feature that E2 shares 
with class II membrane fusion proteins. These results provide unpre- 
cedented insights into HCV entry and will assist in developing an 
HCV vaccine and new inhibitors. 

HCV envelope glycoprotein 2 (E2) is a type I transmembrane protein 
with an amino-terminal ectodomain connected to a carboxy-terminal 
transmembrane helix through an amphipathic, «-helical stem (Fig. 1a)*°. 
E2 is highly modified post-translationally with 9-11 N-linked glyco- 
sylation sites and 18 cysteine residues that are conserved across all 
genotypes. For ease of comparison with other genotypes, we refer to 
the cysteines and N-linked glycosylation sites as C1 to C18 and N1 to 
N11, respectively, with residue numbers from the J6 genotype (2a) given 
in parentheses. Full-length, E2 ectodomain (eE2) (384-656) was pro- 
duced in N-acetylglucosaminyltransferase I-negative (GnTI—) HEK293T 
cells by a lentiviral expression system and grown in an adherent cell 
bioreactor. The resulting eE2 protein is monomeric as determined by 
non-reducing SDS-polyacrylamide gel electrophoresis (PAGE) and 
size-exclusion chromatography (Extended Data Fig. 1). 

Solution-based studies using limited proteolysis and hydrogen deu- 
terium exchange demonstrated that approximately 80 amino acids on 
the N terminus (384-463) from hypervariable region (HVR) 1 through 
to HVR2 are exposed and flexible. This region includes conserved 
sequences implicated in binding to the cellular receptors (SR-BI and 
CD81) as well as several epitopes for neutralizing antibodies (Fig. 1 and 
Extended Data Figs 2 and 3)’. Various N-terminal deletions were 
produced to minimize regions of disorder while preserving an even number 
of cysteines, potentially allowing for intramolecular disulphide-bond 
formation. All constructs were screened for aggregation by non-reducing 


SDS-PAGE and size-exclusion chromatography. E2 core (456-656) is 
soluble, monomeric and maintains similar secondary structure content 
when compared with eE2 as determined by reactivity towards HCV- 
infected patient sera (Extended Data Fig. 4a, b) and circular dichroism 
(data not shown). However, in contrast to eE2, CD81 binding affinity 
and the efficiency of inhibition of HCV cell culture (HCVcc) entry was 
diminished for the E2 core (Extended Data Fig. 4c-e). This indicates 
that the N terminus of eE2 is critical for CD81 interaction and probably 
undergoes a transition from disorder to order on binding. Alternatively, 
the N-terminal region may also be ordered through interactions with 
other factors, for example, E1, apolipoproteins, lipids, cellular receptors, 
or antibodies. 

Monoclonal antibodies were generated against recombinant eE2 
and crystals of deglycosylated E2 core were produced in complex with 
a Fab (2A12) and diffracted to 2.4A resolution (Fig. 1 and Extended 
Data Table 1). The complex structure was determined by molecular 
replacement using a Fab structure followed by iterative rounds of model 
building and refinement. The E2 core domain has a globular fold, con- 
sisting of mostly B-strands and random coil with two short «-helices, 
which is consistent with previous spectroscopic studies of eE2 (refs 12, 13). 
The protein contains two, four-stranded antiparallel B-sheets (termed 
sheets A and B), the planes of which are approximately perpendicular 
to each other. The four strands of the N-terminal f-sheet (sheet A) are 
stabilized by two disulphide bonds, between strands 1 and 3 (C7 (510) 
and C8 (554)) and the N-terminal loop with strand 4 (C5 (496) and C9 
(566)). The loop between strands 2 and 3 contains sequences impli- 
cated in CD81 binding and is flexible, similar to the N-terminal CD81 
binding sites, which were deleted’*”’. After strand 4, the polypeptide 
continues into a long, disordered loop before forming the first short 
helix (H1) followed by the second B-sheet (sheet B). A second short 
ot-helix (H2) is located between strands 6 and 7. A disulphide bond 
(C14 (611) and C16 (648)) between strand 6 and the C-terminal strand 
8 further stabilizes the fold. The C-terminal strands (7 and 8) are the 
longest within the protein with approximately nine amino acids each 
and encompass the 2A12-binding site. 2A12 does not neutralize HCV 
infection, indicating that the epitope is either buried within the particle 
or incapable of preventing entry (Extended Data Fig. 4f). The two 
B-sheets are held together by (1) two disulphide bonds, connecting 
the loops before strand 1 and after H2 (C4 (488) with C15 (624)) as 
well as the loops after strand 4 and before H1 (C10 (571) and C13 
(601)), and (2) an extensive hydrophobic core consisting of numerous 
aromatic residues (Extended Data Fig. 5). 

HCV belongs to the genus Hepacivirus of the Flaviviridae family. 
Other members of the family include the flavivirus and pestivirus 
genera, which consist of arthropod-borne viruses and important live- 
stock pathogens, respectively’®. The flavivirus envelope glycoprotein 
(E) is a class II fusion protein and HCV E2 was expected to have a 
similar fold’*’”"*. All class II fusion proteins have a common elongated 
structure, consisting of predominantly B-sheets, and exist as homo- or 
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Figure 1 | Overview of HCV E2. a, Schematic representation of the HCV 
genome and E2 domain organization. Full-length eE2 and the crystallization 
construct are indicated by the black and grey bars, respectively. C, capsid 
protein; non-structural protein (NS) 2-5B. Asterisks indicate the location of 
trypsin (blue), chymotrypsin (green) and GluC (magenta) cleavage sites. 
b-d, Ribbon diagram of the E2 core domain bound to Fab 2A12 (b) and alone 


heterodimers with the membrane-fusion, hydrophobic peptide buried 
at the dimer interface at neutral pH. On receptor binding and/or expo- 
sure to low pH, these proteins undergo self-rearrangement into stable 
trimers, exposing the fusion peptide and resulting in viral and host 
membrane fusion. Despite containing a similar extended organization, 
the recent structure of the pestivirus bovine viral diarrhoea virus (BVDV) 
E2 glycoprotein does not represent a typical class II fusion protein fold 
and lacks an apparent fusion peptide, indicating that it is unlikely to be 
a class II fusion protein’?”®. 

Similar to the flavivirus and pestivirus glycoproteins, the HCV E2 
core secondary structure consists of predominantly B-sheets and ran- 
dom coil. However, E2 core is a monomer with a compact globular 
shape, in contrast to the extended structures reported in other viruses. 
Solution-based small-angle X-ray scattering (SAXS) was used to cor- 
relate the crystallographic core domain structure with fully glycosy- 
lated eE2 and various fragments. The ab initio SAXS envelopes of E2 
core and eE2 are similar, with approximately the same radius of gyra- 
tion (Rg) (Fig. 2a, b and Extended Data Table 2). Glycosylation, which 
is missing in the E2 core crystal structure, represents roughly one-third 
of the mass and accounts for the unmodelled areas of the envelopes. 
Notably, neither the Rg nor the elution profiles on size-exclusion chro- 
matography for fully glycosylated eE2 and E2 core changed signifi- 
cantly at pH 5.0 (Extended Data Fig. 6a, b). These results indicate that 
unlike class II membrane fusion proteins, E2 does not undergo signifi- 
cant structural rearrangements on exposure to low pH. 
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(cand d). The view in d is a 90° rotation about a horizontal axis from c. The E2 
polypeptide chain is coloured from the N terminus (blue) to C terminus (red). 


e, Topology diagram of E2 core domain, detailing secondary structure 
elements, disulphide bonds (dashed lines labelled with SS), N-linked 
glycosylation sites and regions of disordered polypeptide (dotted lines). 


SAXS was used to investigate the CD81 binding region on the E2 
ectodomain. To simplify data interpretation, eE2(/AHVR1) was used, 
as HCV lacking HVRI1 remains infectious’'. The binding site of CD81 
was identified by superimposing the SAXS envelopes of eE2/AHVR1) 
alone and in complex with CD81-LEL (Fig. 2c-e). Although CD81-LEL 
is a dimer in solution (Extended Data Fig. 6c), the extra density in the 
SAXS envelope is more consistent with monomeric binding; however, 
a dimer cannot be ruled out. 

HCV E2 is modified by N-linked glycosylation, which is necessary 
for proper folding and immune invasion. E2 from the J6 genotype has 
11 glycosylation sites. Four of the glycosylation sites are in the flexible 
N-terminal region, which were deleted, and seven are in the core domain 
(N5-N11). The location of N7, N8, N10 and N11 are modelled in the 
final E2 core structure. All of these glycans are present in loop areas, 
indicating that these sites are solvent exposed and flexible. Mutagenesis 
studies in HCVcc have shown that N6, N8 and N10 are integral for 
virus infectivity. Removal of the N6 site results in improved CD81 
binding, whereas N8 and N10 mutations destabilize the protein and 
cause defective particle production”. Both sheets have one critical 
glycosylation site: N8 in sheet A and N10 in sheet B. All four of the 
observed glycosylation sites are on the periphery of the core and are 
located on a highly basic surface (Fig. 3). The opposite surface is pre- 
dominantly hydrophobic and highly conserved when compared to the 
basic surface. Furthermore, the epitope for antibodies (AR1, AR3A and 
AR3C) that inhibit E1E2 binding to CD81 is located at the interface of 
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Figure 2 | Ab initio SAXS envelopes of E2 core, eE2(AHVR1) and eE2. 
a-d, SAXS envelopes of glycosylated E2 core (a), eE2 (b), eE2/AHVR1) (c) and 
eE2(AHVR1) in complex with CD81-LEL (d). The E2 core domain structure 


the hydrophobic and basic surfaces, including the N7 glycosylation site 
(Extended Data Fig. 7a). Interestingly, N7 is only 7 residues away from 
N6, which has a critical effect on CD81 binding. Epitopes for antibodies 
(that is, AR5) that block E1E2 heterodimerization are also found on the 


LETTER 


eE2(AHVR1) + CD81 


has been fitted into a and b. e, Superposition of the SAXS envelopes of 
eE2(AHVR1) alone (c) and in complex with CD81-LEL (d), highlighting the 
approximate position of CD81-LEL. 


hydrophobic surface, making it highly plausible that this surface is 
interacting with E1 in the context of the viral particle’. 

The precise roles played by El and E2 in membrane fusion are not 
fully understood. It has been predicted that amino acids 262-290 in E1 


Figure 3 | Surface features of E2. The surface of the E2 core domain coloured _ N-linked glycosylation sites. The orientations of d-f as well as a—c are identical. 


for electrostatic potential (a and d)—blue (basic), white (neutral), red (acidic) 


The orientation in d-f is rotated 180° about a horizontal axis from the view 


at +5kTe '—and sequence identity (b and e) (green) from the alignment in in a-c. 


Extended Data Fig. 2. c, f, Ribbon diagrams highlighting the location of the 
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as well as 416-430, 504-522 and 604-624 in E2 are important for 
fusion’*****, In the structure, the potential fusion regions in E2 
(504-522 and 604-624) are located in secondary structure elements 
within the hydrophobic core and therefore unlikely to serve as the 
fusion peptide. Furthermore, size-exclusion chromatography and 
SAXS analyses at low pH indicate that E2 does not undergo oligomeric 
or structural rearrangement. Thus, it seems unlikely that E2 has a 
direct role in membrane fusion. However, it is possible that E1 alone 
or the E1E2 heterodimer has a major role in the fusion process. 

Structural comparison of the HCV E2 core domain with all known 
folds in the Protein Data Bank using the Dali server’ identified pro- 
teins with IgG-like folds similar to the N-terminal sheet A, none of 
which is a class II fusion protein, although IgG-like folds are common 
in these proteins. The server failed to identify any statistically signifi- 
cant structures to sheet B, suggesting a novel fold. During the review 
process of this manuscript, a structure for HCV E2 from genotype la 
was published’*. The core domain of both structures is highly similar 
with a root mean squared deviation of 0.8 A for similar carbon o 
atoms. Our biochemical and structural data provide valuable informa- 
tion towards defining the role of E2 and establish a foundation for 
further studies in understanding HCV entry and infection. 


METHODS SUMMARY 


Recombinant, full-length E2 ectodomain (384-656, genotype J6), eE2/AHVR1) 
(413-656) and E2 core (456-656) were produced in HEK293T GnTI— cells”’ by a 
lentiviral expression system with a C-terminal, protein-A tag. Stable cell lines were 
grown in an adherent cell bioreactor. Collected supernatants were purified over 
IgG sepharose and the E2 proteins were eluted by PreScission Protease digestion. 
The eluted protein was deglycosylated with endoglycosidase H (EndoH) and 
purified to homogeneity by heparin affinity and size-exclusion chromatography. 
Monoclonal antibody 2A12 secreted by hybridoma cells maintained in suspension 
culture was purified over protein G resin. Fab was generated using papain diges- 
tion and subtractive protein A chromatography. E2 and 2A12 Fab complex was 
purified by size-exclusion chromatography and crystals were obtained in 18% (w/v) 
PEG 3350, 0.5 M MgCl, 0.1 M HEPES (pH 7.5), 15% dioxane and 2% formamide at 
20°C by the hanging-drop vapour diffusion method. The crystals belong to space 
group P2,2,2 with cell parameters a = 85.96 A, b = 194.57 A and c = 37.92 A. The 
structure was determined by molecular replacement to 2.4 A resolution using a 
mouse Fab (Protein Data Bank accession 2GSI). The final model has an Ro, and 
Rree Of 0.217 and 0.269, respectively. SAXS measurements were performed on eE2, 
E2 core, eE2(AHVR1) and eE2(AHVR1)/CD81-LEL. Data were analysed using 
BioXTAS RAW” and applications from the ATSAS” program suite. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 

J6 E2 expression. eE2, eE2(AHVR1) and E2 core domain encompasses residues 
384-656, 413-656 and 456-656 from the HCV J6 genome, respectively. Owing to 
incomplete deglycosylation at N7 (542) with EndoH, the crystallization construct 
contained an asparagine to glutamine mutation at this position. The expression 
constructs consisted ofa CMV promoter, a prolactin signal sequence, E2 fragment, 
PreScission Protease cleavage site and a C-terminal protein-A (ProtA) tag. The 
entire prolactin-E2-ProtA sequence was PCR amplified and cloned into the pJG 
lentiviral vector (J. Shires). 

Wild-type and GnTI— HEK293T cells” (provided by D. Comoletti) were main- 
tained in Dulbecco’s Modified Eagle Medium (DMEM) with 10% fetal bovine 
serum (FBS) at 37°C with 5% CO . One day before the planned transfection, a 
single T-225 monolayer flask was seeded with 6.0 X 10° HEK293T cells. 90 1g 
pJG-E2, 60 ug psPAX2 (HIV Gag-Pol packaging vector), 30 1g pMD2.G (VSV 
glycoprotein vector) and 450 ul of 2M CaCl, were mixed and brought to a final 
volume of 4.5 ml with ddH,0. 4.5 ml of 2X HEPES buffered saline was added at 
room temperature. After a 2-min incubation, the mixture was added directly to 
HEK293T cells. After 6-8 h, the media was replaced with DMEM with 10% FBS 
and 1% antibiotic/antimycotic (A/A) media and incubated for another 70h. 

Two days after transfection, 10,000 GnTI— HEK293T cells were seeded into a 
single well of a 96-well plate. The supernatant from the transfection, containing 
the recombinant lentiviruses, was collected and centrifuged for 30 min at 4,000g at 
4°C to pellet large cellular debris. Clarified supernatant was transferred to a 
Beckman Ultracentrifuge tube and virus was pelleted for 1.5h at 25,000r.p.m. 
(80,000g) at 4°C in an SW28 rotor. Supernatant was discarded and the pellet 
re-suspended in 120 pl of DMEM containing 20% FBS, 1% A/A, and 8 pg ml! of 
polybrene. 60 il of virus suspension was added to the prepared GnTI— HEK293T 
cells and incubated overnight. Infected cells were expanded and ultimately seeded 
into an adherent cell bioreactor (Cesco Bioengineering) for long-term growth and 
protein production. 
eE2, eE2(AHVR1) and E2 core purification. Cell supernatant containing E2- 
ProtA was centrifuged for 10 min at 7,000g, filtered through a 0.22-1m membrane, 
and loaded onto an IgG FF column (GE Healthcare). The column was extensively 
washed with 20 mM sodium phosphate pH 7.0 and then equilibrated with 20 mM 
HEPES pH 7.5, 250mM NaCl and 5% glycerol. PreScission Protease was added 
into the column at approximately 400 gl‘ of supernatant and incubated over- 
night at 4°C. For deglycosylation, the pH of the protein solution was adjusted 
using 1 M sodium citrate pH 5.5 to a final concentration of 100 mM. EndoH was 
added at a ratio of 1 mg per 2 mg of E2 and the reaction was incubated at room 
temperature for 3-4h. The deglycosylated proteins were desalted into 20mM 
HEPES pH7.5, 50mM NaCl, and 5% glycerol and purified by heparin affinity 
followed by size-exclusion chromatography over a Superdex200 column. Final 
yields for all E2 proteins averaged 30 mg1~' of supernatant. 

Crystallization. A 1.1:1 molar ratio of E2 core to Fab was incubated for 1-2h at 
4°C. The complex was purified over a Superdex200 column equilibrated with 
20mM HEPES pH7.5 and 100mM NaCl. The complex was concentrated to 
5-7mgml~' and crystals were grown by the hanging-drop vapour diffusion 
method. Briefly, 2.5 yl of complex was mixed with an equal volume of reservoir 
solution, comprising of 22% (w/v) PEG 3350, 0.5 M MgCl, 0.1 M HEPES pH 7.5, 
and 15% (v/v) dioxane. Initially, clusters of plate-like crystals grew in 3-4 days. 
Single, plate-like crystals were obtained via microseeding using a similar reservoir 
solution supplemented with 2% (v/v) formamide. Crystals were cryoprotected 
using reservoir solution with 24% (v/v) ethylene glycol and flash cooled in liquid 
nitrogen. Data were collected at a wavelength of 0.979 A using beamline X25 of the 
National Synchrotron Light Source (NSLS), Brookhaven National Laboratory. 
Structure determination and refinement. The crystals belong to space group 
P2,2,2 with cell parameters a = 85.96 A, b = 194.57 A, c= 37.92 A. Phases were 
determined by the molecular replacement method using PHENIX” and the coor- 
dinates from chains A and B from PDB entry 2GSI. Unambiguous placement of 
the Fab heavy and light chains provided the necessary phases to extend the map to 
cover E2 core domain using iterative rounds of model building and density modi- 
fication by COOT*!, PHENIX, REFMAC*” and PARROT”. The final model was 
built to a resolution of 2.4 A, comprising residues 492-522, 538-571 and 596-649 
of E2 from the J6 genome, 1-217 of 2A12 light chain, and 1-133 and 136-218 of 
2A12 heavy chain with two N-linked, N-acetylglucosamine, six molecules of for- 
mamide and 141 solvents molecules. The model coordinates were refined to Rwork 
0.217 and Reree 0.269. Model validation demonstrated 95.0% of the residues located 
in the most favourable region of the Ramachandran plot with 4.8% in the gener- 
ously allowed regions”. Statistics of the data processing and structure refinement 
are summarized in Extended Data Table 1. 

Small angle X-ray scattering (SAXS). Glycosylated E2 proteins were purified 
over IgG and anion exchange columns. The proteins were equilibrated with either 
pH7.5 buffer (50 mM HEPES pH 7.5, 250mM NaCl and 1% glycerol) or pH5.0 
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buffer (50mM sodium citrate pH5.0, 250mM NaCl and 1% glycerol) by 
Superdex200 gel filtration column. Glycosylated eE2(AHVR1) alone or complex 
with CD81-LEL (1:2 molar ratios) was purified using pH 7.5 buffer by gel filtration 
chromatography. Three concentrations of each protein were prepared along with 
their respective buffers as background control. SAXS data was collected on the 
SIBYLS beamline at the Advanced Light Source, Lawrence Berkeley National 
Laboratory. Sample analysis and processing was performed using BioSAXS 
RAW’, ATSAS”, and GNOM™. The ab initio models were calculated using the 
application DAMMIF**. Consensus models and the normalized spatial discrep- 
ancy (NSD) values were calculated by averaging 10 ab initio models using the 
application DAMAVER™. X-ray structures of CD81 (PDB ID 1G8Q), HCV E2 
core and ab initio models were aligned using the application SUPCOMB”. 
Hydrogen deuterium exchange. HD exchange experiments were conducted as 
described previously**. Briefly, 5 tl of deglycosylated eE2 (1.5 mg ml” '), in 200 mM 
NaCl, 20 mM HEPES pH 7.5, was incubated with 15 ul of the same buffer made 
with 99.96% 2,0 (Cambridge Isotope Laboratories) for 10, 100, or 1,000s and 
quenched in 30 pl of 2 M urea, 0.8% formic acid and 50 mM tris(2-carboxyethyl) 
phosphine (TCEP). The reaction mixture was immediately frozen on dry ice until 
injection. For the zero time point experiment, the protein was incubated in the 
buffer made with 'HO and then quenched and frozen. To correct background 
exchange, a completely deuterated sample was produced by incubating the protein 
with 100 mM TCEP in 99.96% 7H,O overnight before being quenched and frozen. 
Dionex RSLC with a C18 column (2.1 X 50 mm, 3 um, Q-C18, 150A, CMP Scientific) 
and LTQ Velos Orbitrap pro were used for LC-MS analysis. The mass was measured 
using Orbitrap with resolution of 60,000 and mass range from 300 to 2,000 m/Z. 
The LC-MS data were analysed using HDexaminer 1.2.0 (Serra Analytics) with 
manual checking of each peptide afterwards. 
Limited proteolysis. 8-10 1g of deglycosylated eE2 protein was mixed with tryp- 
sin, chymotrypsin or GluC at 1:120 (w/w) ratio (endopeptidase:E2) and incubated 
at room temperature. Samples were taken at noted time points and analysed by 
reducing SDS-PAGE, mass spectrometry and N-terminal sequencing. 
Production of monoclonal antibody 2A12. Six-to-eight-week-old female BALB/ 
c mice were immunized intraperitoneally with 50g eE2 in either complete 
Freund’s adjuvant (first immunization only), or incomplete Freund’s adujvant 
bi-weekly for 8 weeks. A final immunization with 50 jig of eE2 was given intrave- 
nously 4 days before collection of splenocytes. Hybridomas were generated using a 
cloned HAT-sensitive mouse myeloma cell line as a fusion partner. Proliferating 
hybridomas were screened for their ability to bind eE2 via ELISA, at which point 
2A12 was positively identified. Monoclonal antibodies were generated in the 
laboratory of A. Grakoui (IACUC protocol number YER-2002369-070816GN). 
Generation and purification of 2A12 Fab. Hybridoma cells were expanded 
to a final volume of 21] in spinner flasks at 100r.p.m. using Iscove’s Modified 
Dulbecco’s Medium, 10% ultra-low IgG FBS, 1% A/A, and 10 mM HEPES (Life 
Technologies). Cells were collected at 2-3 X 10° cells per ml, centrifuged for 
10min at 7,000g, filtered through a 0.22-um membrane, and loaded onto a 
Protein G column (GE Healthcare Life Sciences). After completion, the column 
was washed with 20 mM sodium phosphate (pH 7.0) followed by phosphate buf- 
fered saline (PBS). The antibody was eluted with 0.05% TFA in 2 ml fractions into 
tubes containing 100 il of 1 M Tris pH 7.5 for immediate pH neutralization. The 
eluted antibody was dialysed into 20 mM sodium phosphate pH 7.0 and 10 mM 
EDTA. Insoluble papain was added to the antibody at 0.15 mg per 1 mg of antibody. 
Freshly prepared L-cysteine was added to the reaction to a final concentration of 
20 mM and mixed at 37 °C for 2h. The papain was removed by centrifugation at 
3,500g for 2 min and filtration through a 0.22-um membrane. Fab was purified by 
subtractive chromatography over Protein A FF column and desalted into 20 mM 
Tris pH 8.0. 
Sequencing Ig H and L chain gene segments of 2A12 antibody. Total RNA 
isolated from 2A12 hybridoma cells was reverse transcribed into cDNA using 
random hexamers. Expressed heavy (H) and light (L) chains were amplified using 
standard primers that are complimentary to all murine H and L chain gene segments”. 
The PCR products were sequenced either directly or following cloning into pCR 
2.1-TOPO vector (Life Technologies). 
CD81 purification and binding assays. Human CD81-LEL (residues 112-202) 
was produced as a fusion with C-terminal ProtA tag in HEK293T cells using the 
same lentiviral expression system described for eE2. Cell culture supernatants were 
loaded onto an IgG FF column, washed with 20 mM sodium phosphate pH 7.0, 
eluted with 100 mM sodium citrate pH 3.0 containing 20 mM KCl and immedi- 
ately neutralized with 1 M Tris pH 9.0. The ProtA tag was cleaved by PreScission 
Protease in a ratio of 1:50 (w/w) followed by overnight dialysis in 20 mM HEPES 
pH7.5, 250 mM NaCl, and 5% glycerol. High-purity CD81 protein was obtained 
by anion exchange and size-exclusion chromatography. 

For binding studies, a 96-well plate (Nalgene Nunc, Thermo Fisher Scientific) 
was coated with 501g of CD81-LEL overnight at 4°C. All experiments were 
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duplicated against BSA as a negative control. Plates were washed three times with 
PBS containing 0.05% Tween 20 (PBS-T) and blocked with 3% (w/v) BSA in PBS-T 
for 1 hat room temperature. 50 jl of eE2 or E2 core at different concentrations was 
added to appropriate wells and incubated overnight at 4°C. On day 3, the wells 
were washed three times with PBS-T and incubated with monoclonal antibody 
2A12 cell supernatant for 1 h at room temperature. Plates were washed three times 
with PBS-T and incubated with anti-mouse-HRP conjugated antibody for 1h at 
room temperature. Finally, the plate was washed five times with PBS-T. 50 pl of 
TMB substrate (ThermoFisher Scientific) was added to each well and incubated for 
5 min, followed by the addition of 50 il of 2 M sulphuric acid to stop the reaction. 
Absorbance readings were acquired at 450 nm using Softmax Pro software on a 
Spectra Max 250 (Molecular Devices). 

Neutralization assay. Huh-7.5 cells were maintained in DMEM containing 10% 
FBS (Hyclone) and 100 pg ml ' of penicillin and streptomycin (Cellgro) at 37 °C 
in 5% CO. Naive Huh-7.5 cells were seeded at 6,000 cells per well in a 96-well 
plate. The following day, 100 pil of 2C1, 2A12, or H113 serially diluted in complete 
media were added per well at various concentrations. In parallel, 100 ul of eE2, E2 
core, gp140, or CD81-LEL serially diluted in complete DMEM were added at varying 
concentrations beginning at 100 pg ml’. Cells were then infected with 100 ll of 
genotype 2a virus Cp7 encoding the Renilla luciferase gene*. 72 h after infection, 
relative light units were measured on a Clarity 4.0 luminometer (Biotek) using the 
Renilla Luciferase Assay System (Promega). 

Assessment of cellular cytotoxicity. Huh-7.5 cells were incubated with varying 
concentrations of protein as described above, beginning at 100 pg ml". After 72 h, 
cells were washed once with PBS, treated with trypsin, and collected in 100 jl PBS. 
Cells were stained with 7-AAD according to the manufacturer’s instructions (BD 
Biosciences) and analysed using a BD LSR II and FlowJo software (Tree Star). 
Human plasma ELISA. 96-well enzyme immunoassay plates (ThermoFisher 
Scientific) were coated overnight at 4°C with 50 ul of a 1 pg ml’ solution of eE2 
or E2 core diluted in 0.1 M Na2CO3. Plates were washed twice with PBS-T and then 
blocked for 1 h at 37 °C in PBS-T containing 10% fetal calf serum (HyClone). Blood 
samples were collected in heparin tubes (Becton Dickinson) and plasma was iso- 
lated and frozen at —80 °C. Plasma was serially diluted in a binding solution com- 
posed of 0.1% (v/v) normal goat serum in PBS-T (Jackson ImmunoResearch 
Laboratories). 100 ul of sample were added per well and incubated at room tem- 
perature for 90 min. After eight washes with PBS, 100 pl of mouse anti-human IgG 
biotin antibody (Mabtech) diluted 1:20,000 in binding solution were added per 
well and incubated 1 h at room temperature. Following five additional washes with 
PBS, 100 pl streptavidin-horseradish peroxidase (HRP) conjugate was added to 
each well at a 1:2,000 dilution in binding buffer and incubated for 45 min at room 
temperature (Mabtech). Absorbance was measured and analysed using a VersaMax 


microplate reader and SoftMax Pro software (Molecular Devices) following five 
washes and the addition of tetramethylbenzidine substrate solution (Ebioscience). 
Human sera were isolated from whole-blood samples, and informed consent was 
obtained for all subjects (IRB no. 1358-2004, Emory University School of Medicine, 
principal investigator A. Grakoui). 

Alignment. Secondary structures were assigned using the program DSSP*". Se- 
quences were obtained from the National Center for Biotechnology Information 
(NCBI) using the following accession numbers: J6 ADV40003.1, H77 ACA53555.1, 
J8 P26661.3, S52 AEB71616.2, ED43 AEB71617.2, SA13 AEB71618.2, HK6a 
AEB71619.2, QC69 ACM69041.1. The E2 sequences were aligned with multiple 
alignment using fast Fourier transform (MAFFT)” and edited for figure genera- 
tion using JalView version 2 (ref. 43). 
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Extended Data Figure 1 | eE2, eE2(AHVR1) and E2 core are highly soluble —__E2 core proteins on a Superdex200 gel filtration column. The elution positions 
and monomeric in solution. a, A comparison of proteins under reducingand __ of the void volume (>200 kDa), albumin (66 kDa) and cytochrome C 
non-reducing conditions is shown by a 10% SDS-PAGE gel with protein (12.4 kDa) are indicated. Molecular masses of eE2, eE2(/AHVR1) and E2 core 
standards (Std). b, Size-exclusion chromatography of eE2, eE2(/AHVR1) and are ~46 kDa, ~42 kDa and ~32 kDa, respectively. 
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Extended Data Figure 2 | eE2 sequence alignment. Red bent arrows indicate 
the N- and C-terminal boundaries of the E2 crystallization construct. Cylinders 
and arrows represent o-helices and f-strands, respectively, and are coloured 
according to cartoon representation in Fig. 1. CD81 binding regions are 
bracketed in red; hypervariable regions are bracketed in black. SR-BI binds to 


HVRI. The asterisks indicate the location of trypsin (blue), chymotrypsin 
(green) and GluC (magenta) cleavage sites. The binding sites of neutralizing 
antibodies for which structural information is available are coloured orange for 
HCV1 and AP33, blue for mAb 8, and purple for HC34-1 and HC34-17. 
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Extended Data Figure 3 | Hydrogen deuterium exchange and limited 
proteolysis of eE2. a, The percentage hydrogen deuterium exchange shown at 
10, 100 and 1,000 s time points. The secondary structure of E2 core is placed 
above to emphasize flexible regions. A red arrow indicates the E2 core N 
terminus. Extra residues (grey) on N and C terminus come from the vector. 
Potential cleavage sites for trypsin (blue), chymotrypsin (green) and GluC 
(magenta) are indicated by asterisks. The colour pattern indicates the 


mm 20° 
mmm 20% 
Mmm 30° 
Mm 208 
| 50% 
EBHEEEa i 60% 
RMYV GGVEH RLTAA CNFTR GD Wm 70? 
=a Mmm< 802 
== imm< 20% 
eres mm °°: 
> no coverage 
GluC 
Std 0 15 30 60 120 240 (min) 
we 
ss 


—_S8eeE0Ge=~ 
-sososae 


~— 


percentage of exchange. Grey areas are the regions of no coverage. b, Digestion 
of deglycosylated eE2 with chymotrypsin (left) and GluC (right) reveals a shift 
from the ~35-kDa untreated protein (0 min) to ~25 kDa after digestion. 
Samples were taken at the indicated time points and analysed by reducing 
12% SDS-PAGE gel. Molecular mass protein standards (Std) are indicated. 
The bands were analysed by N-terminal sequencing and mass spectrometry. 
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Extended Data Figure 4 | Functional analyses of eE2 and E2 core. 

a, Antibodies from patient sera infected with HCV genotype 2 show a 
concentration-dependent binding to eE2 (red) whereas healthy donor sera 
exhibit only background binding (black). b, Similar binding is observed for E2 
core. The measurements were done in triplicate with the error bars representing 
the standard error of the mean (s.e.m.). ¢, E2 core (light grey) shows reduced 
binding to CD81 when compared to eE2 (dark grey) by an ELISA. Bars with 
stripes indicate E2 binding to a negative control, BSA. The solid black bar 
indicates CD81 binding to PBS, used to verify the absence of background. The 
measurements were done in triplicate with the error bars representing the s.e.m. 
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d, eE2 (blue) and CD81-LEL (positive control, grey) inhibit the infection. E2 
core (red) shows reduced inhibition. HIV gp140 (black) expressed in the same 
system was used as a negative control. The measurements were done in 
triplicate with the error bars representing the s.e.m. e, To rule out the possibility 
of toxic effects from the recombinant proteins, the cell viability was measured as 
described in Methods, using similar protein concentrations as in d. f, In an 
ELISA, 2A12 (red), and an irrelevant antibody, H113 (grey), fail to neutralize 
HCVcc infection. 2C1 (positive control, black), a mouse monoclonal antibody 
that binds to the disordered N-terminal region of eE2, blocks infection. The 
measurements were done in triplicate with the error bars representing the s.e.m. 
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Extended Data Figure 5 | E2 core contains an extensive hydrophobic core. composed of mostly aromatic amino acids (green) and five disulphide bonds 
Sheets A and B are held together by an extensive hydrophobic core (yellow). 
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Extended Data Figure 6 | eE2 and E2 core do not undergo oligomeric indicated. c, The SAXS envelope of CD81-LEL fit with a dimer crystal structure 
changes at low pH. a, b, An overlay of E2 core (a) and eE2 (b) elution profiles | (PDB 1G8Q). The individual proteins of the CD81-LEL dimer are coloured red 
from Superdex200 gel filtration at pH 7.5 (blue) and pH 5.0 (red). The expected —_ and blue. 

void volume and observed elution positions of individual proteins are 
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Extended Data Figure 7 | Epitope mapping of conformational antibodies _ heterodimerization and is mapped on a well-conserved hydrophobic surface of 
on E2 core surface. a, Surface epitopes of AR1 (orange) are shown. ARI blocks _ the core. b, Surface of E2 core coloured by electrostatic potential. The view in 
the E1E2 heterodimer binding to CD81. AR5A (purple) inhibits E1E2 a and b is identical. 
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Extended Data Table 1 | Summary of the X-ray crystallographic analyses 


* Highest resolution shell is shown in parenthesis. 
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Extended Data Table 2 | Summary of SAXS analyses 
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CD81 18.80.04 
E2 core pH 5.0 27.840.07 
eE2 pH 5.0 27.840.07 
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Organisms are defined by the information encoded in their genomes, 
and since the origin of life this information has been encoded using a 
two-base-pair genetic alphabet (A-T and G-C). In vitro, the alphabet 
has been expanded to include several unnatural base pairs (UBPs)'*. 
We have developed a class of UBPs formed between nucleotides bear- 
ing hydrophobic nucleobases, exemplified by the pair formed between 
d5SICS and dNaM (d5SICS-dNaM), which is efficiently PCR-amplified’ 
and transcribed*® in vitro, and whose unique mechanism of repli- 
cation has been characterized®’. However, expansion of an organ- 
ism’s genetic alphabet presents new and unprecedented challenges: 
the unnatural nucleoside triphosphates must be available inside the 
cell; endogenous polymerases must be able to use the unnatural tri- 
phosphates to faithfully replicate DNA containing the UBP within 
the complex cellular milieu; and finally, the UBP must be stable in 
the presence of pathways that maintain the integrity of DNA. Here 
we show that an exogenously expressed algal nucleotide triphosphate 
transporter efficiently imports the triphosphates of both d5SICS and 
dNaM (d5SICSTP and dNaMTP) into Escherichia coli, and that the 
endogenous replication machinery uses them to accurately replicate 
a plasmid containing d5SICS-dNaM. Neither the presence of the 
unnatural triphosphates nor the replication of the UBP introduces a 
notable growth burden. Lastly, we find that the UBP is not efficiently 
excised by DNA repair pathways. Thus, the resulting bacterium is the 
first organism to propagate stably an expanded genetic alphabet. 

To make the unnatural triphosphates available inside the cell, we previ- 
ously suggested using passive diffusion of the free nucleosides into the 
cytoplasm followed by their conversion to the corresponding tripho- 
sphate via the nucleoside salvage pathway*. Although we have shown 
that analogues of dSSICS and dNaM are phosphorylated by the nucle- 
oside kinase from Drosophila melanogaster*, monophosphate kinases 
are more specific’, and in E. coli we found that overexpression of the 
endogenous nucleoside diphosphate kinase results in poor growth. As 
an alternative, we focused on the nucleotide triphosphate transport- 
ers (NTTs) of obligate intracellular bacteria and algal plastids” *. We 
expressed eight different NTTs in E. coli C41(DE3)’*""” and measured 
the uptake of [«-**P]-dATP as a surrogate for the unnatural triphosphates 
(Extended Data Fig. 1). We confirmed that [o-??P]-dATP is efficiently 
transported into cells by the NTTs from Phaeodactylum tricornutum 
(P{NTT2)"* and Thalassiosira pseudonana (TpNTT2)*. Although NTTs 
from Protochlamydia amoebophila (PamNTT2 and PamNTT5)** also 
import [o->?P] -dATP, Pt{NTT2 showed the most activity, and both it 
and TpNTT2 are known to have broad specificity’*, making them the 
most promising NTTs for further characterization. 

Transport via an NTT requires that the unnatural triphosphates are 
sufficiently stable in culture media; however, preliminary characteriza- 
tion of dSSICSTP and dNaMTP indicated that decomposition occurs 
in the presence of actively growing E. coli (Extended Data Fig. 2). Similar 
behaviour was observed with [a-**P]-dATP, and the dephosphorylation 
products detected by thin-layer chromatography (TLC) for [o-°*P]-dATP, 
or by high-performance liquid chromatography (HPLC) and matrix- 
assisted laser desorption/ionization (MALDI) for dSSICSTP and dNaMTP, 
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suggest that decomposition is mediated by phosphatases. As no degra- 
dation was observed upon incubation in spent media, decomposition 
seems to occur within the periplasm. No increase in stability was observed 
in cultures of single-gene-deletion mutants of E. coli BW25113 lacking 
a specific periplasmic phosphatase’ (as identified by the presence of a 
Sec-type amino-terminal leader sequence), including phoA, ushA, appA, 
aphA, yijX, surE, yfbR, yjjG, yfaO, mutT, nagD, yggV, yrfG or ymfB, 
suggesting that decomposition results from the activity of multiple phos- 
phatases. However, the extracellular stability of [-““P]-dATP was sig- 
nificantly greater when 50 mM potassium phosphate (KPi) was added to 
the growth medium (Extended Data Fig. 3). Thus, we measured [o-*?P]- 
dATP uptake from media containing 50 mM KPi after induction of the 
transporter with isopropyl-B-p-thiogalactoside (IPTG) (Extended Data 
Fig. 4). Although induction with 1 mM IPTG resulted in slower growth, 
consistent with the previously reported toxicity of NITs”, it also resulted 
in maximal [o-?P]-dATP uptake. Thus, after addition of 1 mM IPTG, 
we analysed the extracellular and intracellular stability of [.-°*P]-dATP 
as a function of time (Extended Data Fig. 5). Cells expressing PENTT2 
were found to have the highest levels of intracellular [o-**P]-dATP, and 
although both extra- and intracellular dephosphorylation was still observed, 
the ratio of triphosphate to dephosphorylation products inside the cell 
remained roughly constant, indicating that the extracellular concen- 
trations and PINT T2-mediated influx are sufficient to compensate for 
intracellular decomposition. 

Likewise, we found that the addition of KPi increased the extracel- 
lular stability of dsSICSTP and dNaMTP (Extended Data Fig. 2), and 


a b : 
100, Media m 3P 
—_ m 2P 
& 75: al 
= c OP 
ae Ss 
fe} Q 504 
a 
>, 5 25 
{S) 
i» d5SICS-dNaM 0 ai] P 
100 Cytoplasm 
H 
N-H 75 


1 l 


oP ata. 


d5SICS_ dNaM 


vX “4 
e) \ 
N~Y N 
fe) \ = 
ley Oise 
H 


Composition (%) 
a 
Oo 


No sie 

‘J 

N oak 
O” 


Figure 1 | Nucleoside triphosphate stability and import. a, Chemical 
structure of the d5SICS-dNaM UBP compared to the natural dG-dC base pair. 
b, Composition analysis of dS5SICS and dNaM in the media (top) and 
cytoplasmic (bottom) fractions of cells expressing PINTT2 after 30 min 
incubation; dA shown for comparison. 3P, 2P, 1P and OP correspond to 
triphosphate, diphosphate, monophosphate and nucleoside, respectively; [3P] 
is the intracellular concentration of triphosphate. Error bars represent s.d. 
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when a stationary phase culture was diluted 100-fold into fresh media, 
the half-lives of both unnatural triphosphates (initial concentrations of 
0.25 mM) were found to be approximately 9 h, which seemed sufficient 
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for our purposes. Thirty minutes after their addition to the media, neither 
of the unnatural triphosphates was detected in cells expressing TpNTT2; 
in contrast, 90 .M of dSSICSTP and 30 uM of dNaMTP were found in 
the cytoplasm of cells expressing PENTT2 (Fig. 1b). Although intracel- 
lular decomposition was still apparent, the intracellular concentrations 
of intact triphosphate are significantly above the sub-micromolar Kyy 
values of the unnatural triphosphates for DNA polymerases”, setting 
the stage for replication of the UBP in a living bacterial cell. 

The replication of DNA containing d5SICS-dNaM has been validated 
in vitro with different polymerases, primarily family A polymerases, such 
as the Klenow fragment of E. coli DNA polymerase I (pol I)*°”’. As the 
majority of the E. coli genome is replicated by pol IIL we engineered a 
plasmid to focus replication of the UBP to pol I. Plasmid pINF (the infor- 
mation plasmid) was constructed from pUC19 using solid-phase DNA 
synthesis and circular-extension PCR to replace the dA-dT pair at posi- 
tion 505 with dNaM paired opposite an analogue of d5SICS (dTPT3”) 
(Fig. 2a, b). This positions the UBP 362 bp downstream of the ColE1 
origin of replication where leading-strand replication is mediated by 
pol I”, and within the TK-1 Okazaki processing site’, where lagging- 
strand synthesis is also expected to be mediated by pol I. Synthetic 
pINF was constructed using the d5SICS analogue because it should be 
efficiently replaced by d5SICS if replication occurs in vivo, making it 
possible to differentiate in vivo replicated pINF from synthetic pINF. 

To determine whether E. coli can use the imported unnatural triphos- 
phates to stably propagate pINF, C41(DE3) cells were first transformed 
with a pCDF- 1b plasmid encoding PI{NTT2 (hereafter referred to as pACS, 
for accessory plasmid, Fig. 2a) and grown in media containing 0.25 mM 
of both unnatural triphosphates, 50 mM KPiand 1 mM IPTG to induce 
transporter production. Cells were then transformed with pINF, and after 
a 1-h recovery period, cultures were diluted tenfold with the same media 
supplemented with ampicillin, and growth was monitored via culture 
turbidity (Extended Data Table 1). As controls, cells were also transformed 
with pUC19, or grown without either IPTG or without the unnatural 
triphosphates. Again, growth was significantly slower in the presence 
of IPTG, but the addition of dSSICSTP and dNaMTP resulted in only a 
slight further decrease in growth in the absence of pINF, and interest- 
ingly, it eliminated a growth lag in the presence of pINF (Fig. 2c), 
suggesting that the unnatural triphosphates are not toxic and are required 
for the efficient replication of pINF. 

To demonstrate the replication of pINF, we recovered the plasmid 
from cells after 15 h of growth. The introduction of the UBP resulted in 


Figure 2 | Intracellular UBP replication. a, Structure of pACS and pINF. 
dX and dY correspond to dNaM and a d5SICS analogue” that facilitated 
plasmid construction (see Methods). cloDF, origin of replication; 

Sm, streptomycin resistance gene; AmpR, ampicillin resistance gene; ori, 
ColE1 origin of replication; lacZa, B-galactosidase fragment gene. b, Overview 
of pINF construction. A DNA fragment containing the unnatural nucleotide 
was synthesized via solid-phase DNA synthesis and then used to assemble 
synthetic pINF via circular-extension PCR”. X, dNaM; Y’, dTPT3 (an analogue 
of d5SICS”*); y, d5SICS (see text). Colour indicates regions of homology. The 
doubly nicked product was used directly to transform E. coli harbouring pACS. 
c, The addition of dsSICSTP and dNaMTP eliminates a growth lag of cells 
harbouring pINF. EP, electroporation. Error bars represent s.d. of the mean, 
n= 3. d, LC-MS/MS total ion chromatogram of global nucleoside content in 
pINEF and pUC19 recorded in dynamic multiple reaction monitoring (DMRM) 
mode. pINF and pUC19 (control) were propagated in E. coli in the presence or 
absence of unnatural triphosphates, and with or without PINTT2 induction. 
The inset shows a 100-fold expansion of the mass-count axis in the d5SICS 
region. e, Biotinylation only occurs in the presence of the UBP, the unnatural 
triphosphates and transporter induction. After growth, pINF was recovered, 
and a 194-nucleotide region containing the site of UBP incorporation 
(nucleotides 437-630) was amplified and biotinylated. B, biotin; SA, 
streptavidin. The natural pUC19 control plasmid was prepared identically to 
pINF. A 50-bp DNA ladder is shown to the left. f, Sequencing analysis 
demonstrates retention of the UBP. An abrupt termination in the Sanger 
sequencing reaction indicates the presence of UBP incorporation (site indicated 
with arrow). 
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a small (approximately twofold) reduction in the copy number of pINF, 
as gauged by its ratio to pACS (Extended Data Table 1); we determined 
that the plasmid was amplified 2 x 10’-fold during growth (approxi- 
mately 24 doublings) based on the amount of recovered plasmid and 
the transformation efficiency. To determine the level of UBP retention, 
the recovered plasmid was digested, dephosphorylated to single nucle- 
osides, and analysed by liquid chromatography-tandem mass spectrom- 
etry (LC-MS/MS)*. Although the detection and quantification of dNaM 
were precluded by its poor fragmentation efficiency and low product 
ion counts over background, signal for d5SICS was clearly observable 
(Fig. 2d). External calibration curves were constructed using the unnat- 
ural nucleoside and validated by determining its ratio to dA in synthetic 
oligonucleotides (Extended Data Table 2). Using the resulting calibra- 
tion curve, we determined the ratio of dA to d5SICS in recovered pINF 
was 1,106 to 1, which when compared to the expected ratio of 1,325 
to 1, suggests the presence of approximately one UBP per plasmid. No 
d5SICS was detected in control experiments in which the transporter 
was not induced, or when the unnatural triphosphates were not added 
to the media, or when pUC19 was used instead of pINF (Fig. 2d, inset), 
demonstrating that its presence results from the replication of the UBP 
and not from misinsertion of the unnatural triphosphates opposite a natural 
nucleotide. Importantly, as the synthetic pINF contained an analogue 
of d5SICS, and d5SICS was only provided as a triphosphate added to 
the media, its presence in pINF confirms in vivo replication. 

To independently confirm and quantify the retention of the UBP in 
the recovered plasmid, the relevant region was amplified by PCR in the 
presence of dSSICSTP and a biotinylated dNaMTP analogue’ (Fig. 2e). 
Analysis by streptavidin gel shift showed that 67% of the amplified DNA 
contained biotin. No shift was observed in control experiments where 
the transporter was not induced, or when unnatural triphosphates were 
not added, or when pUC19 was used instead of pINF, demonstrating 
that the shift results from the presence of the UBP. Based on a calibra- 
tion curve constructed from the shifts observed with the amplification 
products of controlled mixtures of DNA containing dNaM or its fully 
natural counterpart (Methods and Extended Data Fig. 6), the observed 
gel shift corresponds to a UBP retention of 86%. Similarly, when the ampli- 
fication product obtained with dSSICSTP and dNaMTP was analysed 
by Sanger sequencing in the absence of the unnatural triphoshates'”*”’, 
the sequencing chromatogram showed complete termination at the posi- 
tion of UBP incorporation, which with an estimated lower limit of read- 
through detection of 5%, suggests a level of UBP retention in excess of 
95% (Fig. 2f). In contrast, amplification products obtained from pINF 
recovered from cultures grown without PINTT2 induction, without added 
unnatural triphosphates, or obtained from pUC19 propagated under 
identical conditions, showed no termination. Overall, the data unam- 
biguously demonstrate that DNA containing the UBP was replicated 
in vivo and allow us to estimate that replication occurred with fidelity 
(retention per doubling) of at least 99.4% (24 doublings; 86% retention; 
0.994”* = 0.86). This fidelity corresponds to an error rate of approxi- 
mately 10 *, which is comparable to the intrinsic error rate of some poly- 
merases with natural DNA”. 

The high retention of the UBP over a 15-h period of growth (approx- 
imately 24 doublings) strongly suggests that it is not efficiently excised 
by DNA repair pathways. To test further this hypothesis and to exam- 
ine retention during prolonged stationary phase growth, we repeated 
the experiments, but monitored UBP retention, cell growth and unnat- 
ural triphosphate decomposition for up to 6 days without providing any 
additional unnatural triphosphates (Fig. 3 and Extended Data Fig. 7). At 
15 and 19h of growth, the cultures reached an optical density at 600 nm 
(ODeoo) of approximately 0.9 and 1.2, respectively, and both dSSICSTP 
and dNaMTP decomposed to 17-20% and 10-16% of their initial 0.25-mM 
concentrations (Extended Data Fig. 7a). In agreement with the experiments 
described above, retention of the UBP after 15 h was 97 + 5% and >95%, 
as determined by gel shift and sequencing, respectively, and after 19 h it 
was 91 + 3% and >95%. As the cultures entered stationary phase and the 
triphosphates decomposed completely, plasmid loss began to compete 
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Figure 3 | Intracellular stability of the UBP. E. coli C41(DE3) pACS was 
transformed with pINF and grown after a single dose of dSSICSTP and 
dNaMTP was provided in the media. UBP retention in recovered pINF, ODe00, 
and relative amount of dSSICSTP and dNaMTP in the media (100% = 0.25 
mM), were determined as a function of time. Error bars represent s.d. of the 
mean, n = 3. 


with replication (Extended Data Fig. 7b, c, d), but even then, retention 
of the UBP remained at approximately 45% and 15%, at days 3 and 6 
respectively. Moreover, when d5SICS-dNaM was lost, it was replaced 
by dA-dT, which is consistent with the mutational spectrum of DNA 
pol I*°. Finally, the shape of the retention versus time curve mirrors that 
of the growth versus time curve. Taken together, these data suggest that 
in the absence of unnatural triphosphates, the UBP is eventually lost 
by replication-mediated mispairing, and not from the activity of DNA 
repair pathways. 

We have demonstrated that PLNTT2 efficiently imports dSSICSTP 
and dNaMTP into E. coli and that an endogenous polymerase, possibly 
polI, efficiently uses the unnatural triphosphates to replicate DNA con- 
taining the UBP within the cellular environment with reasonable efficiency 
and fidelity. Moreover, the UBP appears stable during both exponen- 
tial and stationary phase growth despite the presence of all DNA repair 
mechanisms. Remarkably, although expression of PINTT2 results ina 
somewhat reduced growth rate, neither the unnatural triphosphates nor 
replication of the UBP results in significant further reduction in growth. 
The resulting bacterium is the first organism that stably harbours DNA 
containing three base pairs. In the future, this organism, ora variant with 
the UBP incorporated at other episomal or chromosomal loci, should 
provide a synthetic biology platform to orthogonally re-engineer cells, 
with applications ranging from site-specific labelling of nucleic acids 
in living cells to the construction of orthogonal transcription networks 
and eventually the production and evolution of proteins with multiple, 
different unnatural amino acids. 


METHODS SUMMARY 


To prepare electrocompetent C41(DE3) pACS cells, freshly transformed E. coli 
C41(DE3) pACS was grown overnight in 2 X YT medium (1.6% tryptone, 1% yeast 
extract, 0.5% NaCl) supplemented with streptomycin and KPi. After 100-fold dilu- 
tion into the same medium and outgrowth at 37 °C to OD¢o9 = 0.20, IPTG was added 
to induce expression of PINTT2. After 40 min, cultures were rapidly cooled, washed 
with sterile water and resuspended in 10% glycerol. An aliquot of electrocompetent 
cells was mixed with pINF and electroporated. Pre-warmed 2 * YT medium con- 
taining streptomycin, IPTG and KPi was added, and an aliquot was diluted 3.3-fold 
in the same media supplemented with 0.25 mM each of dNaMTP and d5SICSTP. 
The resulting mixture was allowed to recover at 37 °C with shaking. After recovery, 
cultures were centrifuged. Spent media was analysed for nucleotide composition by 
HPLC (Extended Data Fig. 7a); cells were resuspended in fresh medium containing 
streptomycin, ampicillin, IPTG, KPiand 0.25 mM each of dNaMTP and d5SICSTP, 
and grown with shaking. At defined time points, ODgo9 was determined and ali- 
quots were removed and centrifuged. Spent media were analysed for nucleotide 
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composition, and pINF was recovered by spin column purification. UBP retention 
was characterized by LC-MS/MS, PCR amplification and gel electrophoresis, or 
sequencing, as described in the Methods. 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Materials. 2 x YT,2 < YT agar, IPTG, ampicillin and streptomycin were obtained 
from Fisher Scientific. Ampicillin and streptomycin were used at 100 pg ml! and 
50 ug ml’, respectively. All pET-16b constructs containing the nucleotide trans- 
porters were kindly provided by I. Haferkamp (Technische Universitat Kaisers- 
lautern, Germany) with the exception of pET16b-RpNTT2, which along with the 
C41(DE3) E. coli strain, was provided by J. P. Audia (University of South Alabama, 
USA). Plasmids pUC19 and pCDF- 1b were obtained from Thermo Scientific and 
EMD Millipore, respectively. Plasmids were purified using the PureLink Quick 
Plasmid DNA Miniprep Kit (Life Technologies). OneTaq, DeepVent, Q5 Hot Start 
High-Fidelity DNA Polymerases, and all restriction endonucleases were obtained 
from New England Biolabs. In general, PCR reactions were divided into multiple 
aliquots with one followed in real time using 0.5 X Sybr Green I (Life Technologies); 
following PCR, the aliquots were recombined, purified by spin column (DNA Clean 
and Concentrator-5; Zymo Research, Irvine, California, USA) with elution in 20 pl 
of water, then separated by agarose gel electrophoresis, followed by band excision 
and recovery (Zymoclean Gel DNA Recovery Kit), eluting with 20 il of water unless 
stated otherwise. Polyacrylamide gels were stained with 1 x Sybr Gold (Life Tech- 
nologies) for 30 min, agarose gels were cast with 1 X Sybr Gold. All gels were visu- 
alized using a Molecular Imager Gel Doc XR+ equipped with 520DF30 filter (Bio-Rad) 
and quantified with Quantity One software (Bio-Rad). The sequences of all DNA 
oligonucleotides used in this study are provided in Supplementary Information. 
Natural oligonucleotides were purchased from IDT (San Diego, California, USA). 
The concentration of dsDNA was measured by fluorescent dye binding (Quant-iT 
dsDNA HS Assay kit, Life Technologies) unless stated otherwise. The concentra- 
tion of ssDNA was determined by UV absorption at 260 nm using a NanoDrop 
1000 (Thermo Scientific). [«-*?P]-dATP (25 Ci) was purchased from PerkinElmer 
(Shelton, Connecticut, USA). Polyethyleneimine cellulose pre-coated Bakerflex TLC 
plates (0.5 mm) were purchased from VWR. dNaM phosphoramidite, dNaM and 
d5SICS nucleosides were obtained from Berry & Associates Inc. (Dexter, Michigan, 
USA). Free nucleosides of dNaM and d5SICS (Berry & Associates) were converted to 
the corresponding triphosphates under Ludwig conditions™. After purification by anion 
exchange chromatography (DEAE Sephadex A-25) followed by reverse phase (C18) 
HPLC and elution through a Dowex 50WX2-sodium column, both triphosphates 
were lyophilized and kept at —20 °C until use. The dSSICSTP analogue dTPT3TP” 
and the biotinylated dNaMTP analogue dmmo2**?!OTP! were made as reported 
previously. MALDI-TOF mass spectrometry (Applied Biosystems Voyager DE-PRO 
System 6008) was performed at the TSRI Center for Protein and Nucleic Acid Research. 
Construction of NTT expression plasmids. The PtNTT2 gene was amplified from 
plasmid pET-16b-P#{NTT2 using primers PtNTT2-forward and PtNTT2-reverse; 
the TpNTT2 gene was amplified from plasmid pET-16b-TpNTT2 using primers 
TpNTT2-forward and TpNTT2-reverse. A linear fragment of pCDF- 1b was gener- 
ated using primers pCDF-1b-forward and pCDF-1b-reverse. All fragments were 
purified as described in Materials. The pCDF- 1b fragment (100 ng, 4.4 X 10° ‘* mol) 
and either the PtNTT2 (78 ng, 4.4 X 107 4 mol) or TpNTT2 (85 ng, 4.4 X 10- 4 mol) 
fragment were then assembled together using restriction-free circular polymerase 
extension cloning” in 1 X OneTag reaction buffer, MgSO, adjusted to 3.0 mM, 
0.2 mM of dNTP, and 0.02 U pl * of OneTaq DNA under the following thermal 
cycling conditions: initial denaturation (96 °C, 1 min); 10 cycles of denaturation 
(96 °C, 30 s), touchdown annealing (54 °C to 49.5 °C for 30 s (—0.5 °C per cycle)), 
extension of 68 °C for 5 min, and final extension (68 °C, 5 min). Upon completion, 
the samples were purified and used for heat-shock transformation of E. coli XL10. 
Individual colonies were selected on lysogeny broth (LB)-agar containing strep- 
tomycin, and assayed by colony PCR with primers PtNTT2-forward/reverse or 
TpNTT2-forward/reverse. The presence of the NTT genes was confirmed by sequenc- 
ing and double digestion with ApalI/EcoO109I restriction endonucleases with the 
following expected pattern: pCDF- 1b-Pf{NTT2 (2,546/2,605 bp), pCDF-1b-TpNTT2 
(2,717/2,605 bp), pCDF-1b (1,016/2,605 bp). The complete nucleotide sequence 
of the pCDF-1b-Pf{NTT2 plasmid (pACS) is provided in Supplementary Information. 
Growth conditions to quantify nucleoside triphosphate uptake. E. coliC41(DE3)'° 
freshly transformed with pCDF-1b-P{NTT2 was grown in2 X YT with streptomy- 
cin overnight, then diluted (1:100) into fresh 2 * YT medium (1 ml of culture per 
uptake with [o-*P]-dATP; 2 ml of culture per uptake with d5SICSTP or dNaMTP) 
supplemented with 50 mM potassium phosphate (KPi) and streptomycin. A nega- 
tive control with the inactive transporter pET-16b-RpNTT2, was treated identically 
except ampicillin was used instead of streptomycin. Cells were grown to an OD¢o0 
of approximately 0.6 and the NTT expression was induced by the addition of IPTG 
(1 mM). The culture was allowed to grow for another hour (final OD¢o9 approxi- 
mately 1.2) and then assayed directly for uptake as described below using a method 
adapted from a previous paper’. 

Preparation of media fraction for unnatural nucleoside triphosphate analysis. 
The experiment was initiated by the addition of either dNaMTP or d5SICSTP 
(10 mM each) directly to the media to a final concentration of 0.25 mM. Cells were 
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incubated with the substrate with shaking at 37 °C for 30 min and then pelleted 
(8,000 r.c.f. (relative centrifugal force) for 5 min, 4 °C). An aliquot of the media 
fraction (40 pl) was mixed with acetonitrile (80 kl) to precipitate proteins*’, and 
then incubated at 22 °C for 30 min. Samples were either analysed immediately by 
HPLC or stored at —80 °C until analysis. Analysis began with centrifugation (12,000 
r.c.f. for 10 min at 22 °C), then the pellet was discarded, and the supernatant was 
reduced to approximately 20 1] by SpeedVac, resuspended in buffer A (see below) 
to a final volume of 50 jl, and analysed by HPLC (see below). 

Preparation of cytoplasmic fraction for nucleoside triphosphate analysis. To 
analyse the intracellular desphosphorylation of the unnatural nucleoside triphos- 
phate, cell pellets were subjected to 3 X 100 ull washes of ice-cold KPi (50 mM). 
Pellets were then resuspended in 250 tll of ice cold KPi (50 mM) and lysed with 250 ul 
of lysis buffer L7 of the PureLink Quick Plasmid DNA Miniprep Kit (200 mM 
NaOH, 1% w/v SDS), after which the resulting solution was incubated at 22 °C for 
5 min. Precipitation buffer N4 (350 ul, 3.1 M potassium acetate, pH 5.5) was added, 
and the sample was mixed to homogeneity. Following centrifugation (>12,000 r.c.f. 
for 10 min, at 22 °C) the supernatant containing the unnatural nucleotides was 
applied to a Hypersep C18 solid phase extraction column (Thermo Scientific) pre- 
washed with acetonitrile (1 ml) and buffer A (1 ml, see HPLC protocol for buffer 
composition). The column was then washed with buffer A and nucleotides were 
eluted with 1 ml of 50% acetonitrile:50% triethylammonium bicarbonate (TEAB) 
0.1 M (pH 7.5). The eluent was reduced to approximately 50 pl in a SpeedVac and 
its volume was adjusted to 100 ul with buffer A before HPLC analysis. 

HPLC protocol and nucleoside triphosphate quantification. Samples were applied 
toa Phenomenex Jupiter LC column (3 jum C18 300 A, 250 X 4.6 mm) and subjected 
toa linear gradient of 0-40% B over 40 min ata flow rate of 1 ml min” ’. Buffer A: 
95% 0.1 M TEAB, pH 7.5; 5% acetonitrile. Buffer B: 20% 0.1 M TEAB, pH 7.5; 80% 
acetonitrile. Absorption was monitored at 230, 273, 288, 326 and 365 nm. 

Each injection series included two extra control samples containing 5 nmol of 
dNaMTP or d5SICSTP. The areas under the peaks that corresponded to tripho- 
sphate, diphosphate, monophosphate and free nucleoside (confirmed by MALDI- 
TOF) were integrated for both the control and the unknown samples (described 
above). After peak integration, the ratio of the unknown peak to the control peak 
adjusted for the loss from the extraction step (62% and 70% loss for dNaM and 
d5SICS, respectively, Extended Data Table 3), provided a measure of the amount 
of each of the moieties in the sample. To determine the relative concentrations of 
unnatural nucleotide inside the cell, the amount of imported unnatural nucleotide 
(dXTP, mol) was then divided by the volume of cells, which was calculated as the 
product of the volume ofa single E. coli cell (1 jum” based on a reported average value”; 
that is, 1 X 107? ul per cell) and the number of cells in each culture (OD¢o9 of 1.0 
equal to 1 X 10” cells per ml (ref. 32)). The RPNTT2 sample was used as a negative 
control and its signal was subtracted to account for incomplete washing of nucle- 
otide species from the media. 
dATP uptake. To analyse the intracellular desphosphorylation of dATP, after induc- 
tion of the transporter, the uptake reaction was initiated by the addition of dATP 
(spiked with [o.-’P]-dATP) toa final concentration of 0.25 mM, followed by incu- 
bation at 37 °C with shaking for 30 min. The culture was then centrifuged (8,000 
r.c.f. for 5 min at 22 °C). Supernatant was analysed by TLC. Cell pellets were washed 
three times with ice-cold KPi (50 mM, 100 pl) to remove excess radioactive sub- 
strate, lysed with NaOH (0.2 M, 100 il) and centrifuged (10,000 r.c.f. for 5 min at 
22 °C) to remove cell debris; supernatant was analysed by TLC. 

TLC analysis. Samples (1 11) were applied on a 0.5 mm polyethyleneimine cellu- 
lose TLC plate and developed with sodium formate pH 3.0 (0.5 M, 30 s; 2.5 M, 2.5 min; 
4.0 M, 40 min). Plates were dried using a heat gun and quantified by phosphorimaging 
(Storm Imager, Molecular Dynamics) and Quantity One software. 
Optimization of nucleotide extraction from cells for HPLC injection. To min- 
imize the effect of the lysis and triphosphate extraction protocols on the decom- 
position of nucleoside triphosphate within the cell, the extraction procedure was 
optimized for the highest recovery with the lowest extent of decomposition (Extended 
Data Table 3). To test different extraction methods, cells were grown as described 
above, washed, and then 5 nmol of either dNaMTP or d5SICSTP was added to the 
pellets, which were then subjected to different extraction protocols including boil- 
ing water, hot ethanol, cold methanol, freeze and thaw, lysozyme, glass beads, NaOH, 
trichloroacetic acid (TCA) with Freon, and perchloric acid (PCA) with KOH”. 
The recovery and composition of the control was quantified by HPLC as described 
above to determine the most effective procedure. Method 3—that is, cell lysis with 
NaOH (Extended Data Table 3)—was found to be most effective and reprodu- 
cible, thus we further optimized it by resuspension of the pellets in ice-cold KPi 
(50 mM, 250 pl) before addition of NaOH to decrease dephosphorylation after cell 
lysis (Method 4). Cell pellets were then processed as described above. See above for 
the final extraction protocol. 

Preparation of the unnatural insert for pINF construction. The TK-1-dNaM 
oligonucleotide containing dNaM was prepared using solid-phase DNA synthesis 
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with ultra-mild DNA synthesis phosphoramidites on CPG ultramild supports 
(1 pmol, Glen Research, Sterling, Virginia, USA) and an ABI Expedite 8905 syn- 
thesizer. After the synthesis, the DMT-ON oligonucleotide was cleaved from the 
solid support, deprotected and purified by Glen-Pak cartridge according to the 
manufacturer’s recommendation (Glen Research), and then subjected to 8 M urea 
8% PAGE. The gel was visualized by ultraviolet shadowing, the band corresponding 
to the 75-mer was excised, and the DNA was recovered by crush and soak extraction, 
filtration (0.45 ,tm), and final desalting over Sephadex G-25 (NAP-25 Columns, GE 
Healthcare). The concentration of the single stranded oligonucleotide was deter- 
mined by ultraviolet absorption at 260 nm assuming that the extinction coefficient of 
dNaM at 260 nm is equal to that of dA. TK-1-dNaM (4 ng) was next amplified by 
PCR under the following conditions: 1 x OneTaq reaction buffer, MgSO, adjusted 
to 3.0 mM, 0.2 mM of dNTP, 0.1 mM of dNaMTP, 0.1 mM of the d5SICSTP ana- 
logue dTPT3TP, 1 uM of each of the primers pUC19-fusion-forward and pUC19- 
fusion-reverse, and 0.02 U pl’ of OneTaq DNA Polymerase (ina total of 4 X 50 il 
reactions) under the following thermal cycling conditions: initial denaturation (96 °C, 
1 min) followed by 12 cycles of denaturation (96 °C, 10s), annealing (60 °C, 15s), 
and extension (68 °C, 2 min). An identical PCR without the unnatural triphosphates 
was run to obtain fully natural insert under identical conditions for the construction 
of the natural control plasmid. Reactions were subjected to spin column purification 
and then the desired PCR product (122 bp) was purified by a 4% agarose gel. 
pUC19 linearization for pINF construction. pUC19 (20 ng) was amplified by 
PCR under the following conditions: 1 X Q5 reaction buffer, MgSO, adjusted to 
3.0 mM, 0.2 mM of dNTP, 1 1M of each primers pUC19-lin-forward and pUC19- 
lin-reverse, and 0.02 U pl’ of Q5 Hot Start High-Fidelity DNA Polymerase (ina 
total of 4 50 jl reactions with one reaction containing 0.5 X Sybr Green I) under 
the following thermal cycling conditions: initial denaturation (98 °C, 30 s); 20 cycles 
of denaturation (98 °C, 10 s), annealing (60 °C, 15 s), and extension (72 °C, 2 min); 
and final extension (72 °C, 5 min). The desired PCR product (2,611 bp) was purified 
by a 2% agarose gel. 

PCR assembly of pINF and the natural control plasmid. A linear fragment was 
amplified from pUC19 using primers pUC19-lin-forward and pUC19-lin-reverse. 
The resulting product (800 ng, 4.6 X 10” '* mol) was combined with either the nat- 
ural or unnatural insert (see above) (56 ng, 7.0 X 10 — 13 mol) and assembled by cir- 
cular overlap extension PCR under the following conditions: 1 X OneTaq reaction 
buffer, MgSO, adjusted to 3.0 mM, 0.2 mM of dNTP, 0.1 mM of dNaMTP, 0.1 mM 
of the dSSICSTP analogue dTPT3TP, and 0.02 U pl’ of OneTaq DNA Polymer- 
ase (in a total of 4 X 50 jl reactions with one reaction containing 0.5 X Sybr Green I) 
using the following thermal cycling conditions: initial denaturation (96 °C, 1 min); 
12 cycles of denaturation (96 °C, 30 s), annealing (62 °C, 1 min), and extension (68 °C, 
5 min); final extension (68 °C, 5 min); and slow cooling (68 °C to 10 °C ata rate of 
—0.1°Cs_'). The PCR product was analysed by restriction digestion on 1% aga- 
rose and used directly for E. coli transformation. The d5SICS analogue dTPT3” 
pairs with dNaM, and dTPT3TP was used in place of dSSICSTP as DNA contain- 
ing dTPT3-dNaM is better PCR amplified than DNA containing d5SICS-dNaM, 
and this allowed for differentiation of synthetic and in vivo replicated pINF, as well 
as facilitated the construction of high-quality pINF (UBP content >99%). 
Preparation of electrocompetent cells for pINF replication in E. coli. C41(DE3) 
cells were transformed by heat shock* with 200 ng of pACS plasmid, and the trans- 
formants were selected overnight on 2 X YT-agar supplemented with streptomy- 
cin. A single clone of freshly transformed C41(DE3) pACS was grown overnight in 
2 X YT medium (3 ml) supplemented with streptomycin and KPi (50 mM). After 
100-fold dilution into the same fresh 2 X YT media (300 ml), the cells were grown 
at 37 °C until they reached an ODgoo of 0.20 at which time IPTG was added to a 
final concentration of 1 mM to induce the expression of PINTT2. Cells were grown 
for another 40 min and then growth was stopped by rapid cooling in ice water with 
intensive shaking. After centrifugation in a prechilled centrifuge (2,400 r.c.f. for 10 min, 
4 °C), the spent media was removed, and the cells were prepared for electropora- 
tion by washing with ice-cold sterile water (3 x 150 ml). After washing, the cells 
were resuspended in ice-cold 10% glycerol (1.5 ml) and split into 50-1 aliquots. 
Although we found that dry ice yielded better results than liquid nitrogen for freezing 
cells to store for later use, freshly prepared cells were used for all reported experi- 
ments as they provided higher transformation efficiency of pINF and higher rep- 
lication fidelity of the UBP. 

Electroporation and recovery for pINF replication in E. coli. The aliquot of cells 
was mixed with 2 11 of plasmid (400 ng), transferred to 0.2 cm gap electroporation 
cuvette and electroporated using a Bio-Rad Gene Pulser according to the manu- 
facturer’s recommendations (voltage 25 kV, capacitor 2.5 1F, resistor 200 Q, time 
constant 4.8 ms). Pre-warmed 2 X YT media (0.95 ml, streptomycin, 1 mM IPTG, 
50 mM KPi) was added, and after mixing, 45 jl was removed and combined with 
105 ,l of the same media (3.33-fold dilution) supplemented with 0.25 mM of dNaMTP 
and d5SICSTP. The resulting mixture was allowed to recover for 1 h at 37 °C with 
shaking (210 revolutions per min (r.p.m.)). The original transformation media 


(10 pl) was spread onto 2 X YT-agar containing streptomycin with 10- and 50-fold 
dilutions for the determination of viable colony forming units after overnight growth 
at 37 °C to calculate the number of the transformed pINF molecules (see the section 
on calculation of the plasmid amplification). Transformation, recovery and growth 
were carried out identically for the natural control plasmid. In addition, a negative 
control was run and treated identically to pINF transformation except that it was 
not subjected to electroporation (Extended Data Fig. 7b). No growth in the untrans- 
formed negative control samples was observed even after 6 days. No PCR amp- 
lification of the negative control was detected, which confirms that unamplified 
pINF plasmid is not carried through cell growth and later detected erroneously as 
the propagated plasmid. 

Analysis of pINF replication in E. coli. After recovery, the cells were centrifuged 
(4,000 r.c.f. for 5 min, 4 °C), and spent media (0.15 ml) was removed and analysed 
for nucleotide composition by HPLC (Extended Data Fig. 7a). The cells were resus- 
pended in fresh 2 X YT media (1.5 ml, streptomycin, ampicillin, 1 mM IPTG, 50 mM 
KPi, 0.25 mM dNaMTP, 0.25 mM d5SICSTP) and grown overnight at 37 °C while 
shaking (250 r.p.m.), resulting in tenfold dilution compared to recovery media or 
33.3-fold dilution compared to the originally transformed cells. Aliquots (100 jl) 
were taken after 15, 19, 24, 32, 43, 53, 77 and 146 h, OD¢oo was determined, and the 
cells were centrifuged (8,000 r.c.f. for 5 min, 4 °C). Spent media were analysed for 
nucleotide composition by HPLC (Extended Data Fig. 7a), and the pINF and pACS 
plasmid mixtures were recovered and linearized with Ndel restriction endonuclease; 
pINF plasmid was purified by 1% agarose gel electrophoresis (Extended Data Fig. 7b) 
and analysed by LC-MS/MS. The retention of the UBP on the pINF plasmid was 
quantified by biotin gel shift mobility assay and sequencing as described below. 
Mass spectrometry of pINF. Linearized pINF was digested to nucleosides by treat- 
ment with a mixture of nuclease P1 (Sigma-Aldrich), shrimp alkaline phosphatase 
(NEB), and DNase I (NEB), overnight at 37 °C, following a previously reported 
protocol”. LC-MS/MS analysis was performed in duplicate by injecting 15 ng of 
digested DNA on an Agilent 1290 UHPLC equipped with a G4212A diode array 
detector and a 6490A Triple Quadrupole Mass Detector operating in the positive 
electrospray ionization mode (+ ESI). UHPLC was carried out using a Waters XSelect 
HSS T3 XP column (2.1 X 100 mm, 2.5 tm) with the gradient mobile phase con- 
sisting of methanol and 10 mM aqueous ammonium formate (pH 4.4). MS data 
acquisition was performed in Dynamic Multiple Reaction Monitoring (DMRM) 
mode. Each nucleoside was identified in the extracted chromatogram associated 
with its specific MS/MS transition: dA at m/z 252—>136, d5SICS at m/z 292->176, 
and dNaM at m/z 275-171. External calibration curves with known amounts of 
the natural and unnatural nucleosides were used to calculate the ratios of indivi- 
dual nucleosides within the samples analysed. LC-MS/MS quantification was vali- 
dated using synthetic oligonucleotides’ containing unnatural d5SICS and dNaM 
(Extended Data Table 2). 

DNA biotinylation by PCR to measure fidelity by gel shift mobility assay. 
Purified mixtures of pINF and pACS plasmids (1 ng) from growth experiments 
were amplified by PCR under the following conditions: 1 x OneTaq reaction buf- 
fer, MgSO, adjusted to 3.0 mM, 0.3 mM of dNTP, 0.1 mM of the biotinylated 
dNaMTP analogue dMMO2°°?'°TP, 0.1 mM of d5SICSTP, 1 4M of each of the 
primers pUC19-seq-forward and pUC19-seq-reverse, 0.02 U pl ' of OneTaq DNA 
Polymerase, and 0.0025 U pl _' of DeepVent DNA Polymerase ina total volume of 
25 pl in an CFX Connect Real-Time PCR Detection System (Bio-Rad) under the 
following thermal cycling conditions: initial denaturation (96 °C, 1 min); 10 cycles 
of denaturation (96 °C, 30 s), annealing (64 °C, 30 s), and extension (68 °C, 4 min). 
PCR products were purified, and the resulting biotinylated DNA duplexes (5 ul, 
25-50 ng) were mixed with streptavidin (1 pil, 1 pg pl, Promega) in phosphate 
buffer (50 mM sodium phosphate, pH 7.5, 150 mM NaCl, 1 mM EDTA), incubated 
for 30 min at 37 °C, mixed with 5 X non-denaturing loading buffer (Qiagen), and 
loaded onto 6% non-denaturing PAGE. After running at 110 V for 30 min, the gel 
was visualized and quantified. The resulting fragment (194 bp) with primer regions 
underlined and the unnatural nucleotide in bold (X represents dNaM or its bioti- 
nylated analogue dMMO2°°""°) is 5’-GCAGGCATGCAAGCTTGGCGTAATC 
ATGG TCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAXTTCCA 
CACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTA 
ATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTT 
CCAGTCGGGAAACCTGTCGTGCCAG. 

Streptavidin shift calibration for gel shift mobility assay. We have already reported 
a calibration between streptavidin shift and the fraction of sequences with UBP in 
the population (see Supplementary Fig. 8 of ref. 1). However, we found that spiking 
the PCR reaction with DeepVent improves the fidelity with which DNA contain- 
ing d5SICS-dMMO2**3"° is amplified, and thus we repeated the calibration with 
added DeepVent. To quantify the net retention of the UBP, nine defined mixtures 
of the TK-1-dNaM template and its fully natural counterpart were prepared (Extended 
Data Fig. 6a), subjected to biotinylation by PCR and analysed by mobility-shift assay 
on 6% non-denaturing PAGE as described above. For calibration, the mixtures 
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TK-1-dNaM template and its fully natural counterpart with a known ratio of 
unnatural and natural templates (0.04 ng) were amplified under the same condi- 
tions over nine cycles of PCR with pUC19-fusion primers and analysed identically 
to samples from the growth experiment (see the section on DNA biotinylation by 
PCR). Each experiment was run in triplicate (a representative gel assay is shown in 
Extended Data Fig. 6b), and the streptavidin shift (SAS, %) was plotted as function 
of the UBP content (UBP, %). The data was then fit to a linear equation, SAS = 
0.77 X UBP + 2.0 (R” = 0.999), where UBP corresponds to the retention of the 
UBP (%) in the analysed samples after cellular replication and was calculated from 
the SAS shift using the equation above. 

Calculation of plasmid amplification. The cells were plated on 2 X YT-agar 
containing ampicillin and streptavidin directly after transformation with pINF, 
and the colonies were counted after overnight growth at 37 °C. Assuming each cell 
is only transformed with one molecule of plasmid, colony counts correspond to the 
original amount of plasmid that was taken up by the cells. After overnight growth, 
the plasmids were purified from a specific volume of the cell culture and quantified. 
As purified plasmid DNA represents a mixture of the pINF and pACS plasmids, 
digestion restriction analysis with NdeI exonuclease was performed to linearize 
both plasmids, followed by 1% agarose gel electrophoresis (Extended Data Fig. 7b). 
An example of calculations for the 19-h time point with one of three triplicates is 
provided in Supplementary Information. 

Fragment generation for Sanger sequencing to measure fidelity. Purified mix- 
tures of pINF and pACS plasmids (1 ng) after the overnight growth were amplified 
by PCR under the following conditions: 1 X OneTaq reaction buffer, MgSO, adjusted 
to 3.0 mM, 0.2 mM of dNTP, 0.1 mM of dNaMTP, 0.1 mM of the d5SICSTP ana- 
logue dTPT3TP, 1 (1M of each of the primers pUC19-seq2-forward and pUC19- 
seq-reverse (see below), and 0.02 U pl” of OneTaq DNA Polymerase in a total 
volume of 25 il under the following thermal cycling conditions: initial denatura- 
tion (96 °C, 1 min); and 10 cycles of denaturation (96 °C, 30 s), annealing (64 °C, 30s), 
and extension (68 °C, 2 min). Products were purified by spin column, quantified to 
measure DNA concentration and then sequenced as described below. The sequenced 
fragment (304 bp) with primer regions underlined and the unnatural nucleotide in 
bold (X, dNaM) is 5’-GCTGCAAGGCGATTAAGTTGGGTAACGCC AGGGT 
TTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAATTCGAGCTCG 
GTACCCGGGGATCCTCTAGAGTCGACCTGCAGGCATGCAAGCTTGGCG 
TAATCATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAX 
TTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTG 
CCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCG 
CTTTCCAGTCGGGAAACCTGTCGTGCCAG. 

Sanger sequencing. The cycle sequencing reactions (10 pl) were performed on a 
9800 Fast Thermal Cycler (Applied Biosystems) with the Cycle Sequencing Mix (0.5 pil) 
of the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems) con- 
taining 1 ng template and 6 pmol of sequencing primer pUC19-seq-reverse under 
the following thermal cycling conditions: initial denaturation (98 °C, 1 min); and 
25 cycles of denaturation (96 °C, 10 s), annealing (60 °C, 15s), and extension (68 °C, 
2.5 min). Upon completion, the residual dye terminators were removed from the 
reaction with Agencourt CleanSEQ (Beckman-Coulter, Danvers, Massachusetts, 
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USA). Products were eluted off the beads with deionized water and sequenced 
directly on a 3730 DNA Analyzer (Applied Biosystems). Sequencing traces were 
collected using Applied Biosystems Data Collection software v3.0 and analysed 
with the Applied Biosystems Sequencing Analysis v5.2 software. 

Analysis of Sanger sequencing traces. Sanger sequencing traces were analysed as 
described previously'”® to determine the retention of the unnatural base pair. In brief, 
the presence of an unnatural nucleotide leads to a sharp termination of the sequencing 
profile, whereas mutation to a natural nucleotide results in ‘read-through’. The extent 
of this read-through after normalization is inversely correlated with the retention 
of the unnatural base pair. Raw sequencing traces were analysed by first adjusting 
the start and stop points for the Sequencing Analysis software (Applied Biosystems) 
and then determining the average signal intensity individually for each channel (A, 
C, Gand T) for peaks within the defined points. This was done separately for the 
parts of the sequencing trace before (section L) and after (section R) the unnatural 
nucleotide. The R/L ratio after normalization (R/L) norm for sequencing decay and 
read-through in the control unamplified sample (R/L = 0.55(R/L) norm + 7.2; see 
ref. 26 for details) corresponds to the percentage of the natural sequences in the 
pool. Therefore, an overall retention (F) of the incorporation of the unnatural base 
pair during PCR is equal to 1 - (R/L) norm. AS significant read-through (over 20%) 
was observed in the direction of the pUC19-seq2-forward primer even with the 
control plasmid (synthetic pINF); sequencing of only the opposite direction (pUC19- 
seq-reverse) was used to gauge fidelity. Raw sequencing traces are shown in Fig. 2f 
and provided as Supplementary Data. 
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a 
Deoxyribotriphosphates Ribotriphosphates 
Transporter Reference 
dG dA dc il G A Cc U 
PtNTT2 82 271 31 428 78 197 49 86 18 
TpNTT2 74 665 60 251 59 49 33 90 18 
PamNTT2 156 437 570 676 15 
PamNTT3 1320 15 
PamNTTS 121 22 360 15 
SnNTT2 179 654 35 
SnNTT3 42 407. 375 9 34 35 
RpNTT2 substrate is unknown 36 


Extended Data Figure 1 | Natural triphosphate uptake by NTTs. a, Survey of 
reported substrate specificity (Ky, UM) of the NTT assayed in this study. 
b, PINTT2 is significantly more active in the uptake of [a-**P]-dATP compared 
to other nucleotide transporters. Raw (left) and processed (right) data are 


shown. Relative radioactivity corresponds to the total number of counts this figure. 
produced by each sample. Interestingly, both PamNTT2 and PamNTT5 exhibit 
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Relative radioactivity («10°) 
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a measurable uptake of dATP although this activity was not reported before. 
This can possibly be explained by the fact that substrate specificity was only 
characterized using competition experiments, and assay sensitivity might 
not have been adequate to detect this activity’’. References 35, 36 are cited in 
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Extended Data Figure 2 | Degradation of unnatural triphosphates in 
growth media. Unnatural triphosphates (3P) of dNaM and d5SICS are 
degraded to diphosphates (2P), monophosphates (1P) and nucleosides (OP) in 
the growing bacterial culture. Potassium phosphate (KPi) significantly slows 


Time (h) Time (h) 


down the dephosphorylation of both unnatural triphosphates. 

a, Representative HPLC traces (for the region between ~20 and 24 min). 
dNaM and d5SICS nucleosides are eluted at approximately 40 min and not 
shown. b, Composition profiles. 
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Extended Data Figure 3 | Effect of potassium phosphate on dATP uptake 
and stability in growth media. a, KPi inhibits the uptake of [o:-**P]-dATP at 
concentrations above 100 mM. Raw (left) and processed (right) data are shown. 
The NTT from Rickettsia prowazekii (RPNTT2) does not mediate the uptake of 
any of the dNTPs and was used as a negative control: its background signal 
was subtracted from those of PENTT2 (black bars) and TpNTT2 (white bars). 
Relative radioactivity corresponds to the total number of counts produced by 
each sample. b, KPi (50 mM) significantly stabilizes [a-**P]-dATP in the 
media. Triphosphate stability in the media is not significantly affected 

by the nature of the NTT expressed. 3P, 2P and 1P correspond to triphosphate, 
diphosphate and monophosphate states, respectively. Error bars represent 
s.d. of the mean, n = 3. 
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Extended Data Figure 4 | dATP uptake and growth of cells expressing total number of counts produced by each sample. b, A stationary phase culture 


PtNTT2 as a function of inducer (IPTG) concentration. Growth curvesand of C41(DE3) pACS cells was diluted 100-fold into fresh 2 x YT media 
[a-**P]-dATP uptake by bacterial cells transformed with pCDF-1b-P{NTT2 containing 50 mM KPi, streptomycin, and IPTG at the indicated 
(pACS) plasmid as a function of IPTG concentration. a, Total uptake of concentrations and were grown at 37°C. Error bars represent s.d. of the 
radioactive substrate (left) and total intracellular triphosphate content (right) | mean, n = 3. 

are shown at two different time points. Relative radioactivity corresponds to the 
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Extended Data Figure 5 | Stability and uptake of dATP in the presence of _ respectively. 3P, 2P and 1P correspond to nucleoside triphosphate, diphosphate 
50 mM KPi and 1 mM IPTG. Composition of [«-**P]-dATP in the media and monophosphate, respectively. M refers to a mixture of all three compounds 
(left) and cytoplasmic fraction (right) as a function of time. TLC images and _ that was used as a TLC standard. The position labelled ‘Start’ corresponds to 
their quantifications are shown at the bottom and the top of each of the panels, _ the position of sample spotting on the TLC plate. 
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Extended Data Figure 6 | Calibration of the streptavidin shift (SAS). a, The 
SAS is plotted as a function of the fraction of template containing the UBP. 
Error bars represent s.d. of the mean, n = 3. b, Representative data. 

SA, streptavidin. 
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Extended Data Figure 7 | Decomposition of unnatural triphosphates, pINF data are shown in three lanes with the untransformed control shown in the 
quantification, and retention of the UBP with extended cell growth. fourth, rightmost lane (see Methods). c, Number of pINF doublings as a 

a, Dephosphorylation of the unnatural nucleoside triphosphate. 3P,2P,1P and _ function of time. The decrease starting at approximately 50 h is due to the loss 
OP correspond to triphosphate, diphosphate, monophosphate and nucleoside _ of the pINF plasmid that also results in increased error. See the section on pINF 
states, respectively. The composition at the end of the 1 h recovery is shown at __ replication in E. coli in the Methods for details. d, UBP retention (%) as a 

the right. b, Restriction analysis of pINF and pACS plasmids purified from function of growth as determined by gel shift (data shown in Fig. 3) and Sanger 
E. coli, linearized with Ndel restriction endonuclease and separated on a 1% sequencing (sequencing traces are available as Supplementary Data). In a, c 
agarose gel (assembled from independent gel images). Molar ratios of pINF/ _ and d, error shown is the s.d. of mean, n = 3. 

pACS plasmids are shown at the top of each lane. For each time point, triplicate 
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Extended Data Table 1 


molar ratio to pACS after 19 h of growth 


OD6oo of E. coli cultures and relative copy 
number of plasmid (pINF or control pUC19) as determined by its 


: Relative ODe00 ODeoo 
Plasmid IPTG dXTPIAYTP Go Number (15h) (19%) 
= ry 18 0.34. 0.75 

pINF + = 46 0.15 0.75 
a + 8.9 3.13 3.98 

+ 2.8 0.54 = 1.25 

puc19 a 2.6 0.73 1.39 


X, NaM; Y, 5SICS. 
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Extended Data Table 2 | Relative quantification by LC-MS/MS using synthetic oligonucleotides containing d5SICS and dNaM 


Oligonucleotide Size dA/d5SICS dA/dNaM 
ssDNA (nt Sequence Exp. (calcd. Exp. (calcd. 


CAC ACA GGA AAC AGC TAT GAC CCG GGT TAT TAC ATG 

D6-NaM 82 CGC TAG CAC TTG GAA TTC ACC AG ACG NNN NaM NNN 22.5 (23.5) 
CGG GAC CCA TAG T 
GAA ATT AAT ACG ACT CAC TAT AGG GTT AAG CTT AAC 

D6-5SICS 87 TTT AAG AAG GAG ATT TAC TAT GGG TCC CG NNN 5SICS 23.4 (25.5) 

NNN CGT CTG GTG AAT TCC 

CAC ACA GGA AAC AGC TAT GAC CCG GGT TAT TAC ATG 

CGC TAG CAC TTG GAA TTC ACT ATC AC NaM AGT CAC 

NaM AGT AAT CCA TAG TAA ATC TCC TTG TTA AGC TTA 

ACC CTA TAG TGA GTC GTA TTA ATT TCT 


D13-NaMx2 130 16.1 (19.5) 


* dA/d5SICS and dA/dNaM ratios were calculated assuming that randomized nucleotides (N) around the unnatural base are distributed equally. 
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Extended Data Table 3 | Summary of the most successful extraction methods 


LETTER 


Total recovery (%)" 


Triphosphate stability (%)* 


Method Protocol summary dNaM d5SICS dNaM d5SICS Ref. 

1. TCAwith Freon Lyse with cold TCA 38 23 92 99 Adapted 
Extract aqueous phase using from Ref. 37 
Freon with trioctylamine 
solution 

2. PCAw/KOH Lyse with cold PCA 36 21 98 77 Adapted 
Precipitate proteins with KOH from Ref. 38 
and KPi 

3. NaOH w/ KOAc Lyse with NaOH and SDS 21 26 86 100 see 
Precipitate proteins with footnotes 
potassium acetate 

4. NaOH w/ KOAc Suspend cells in KPi 38 30 99 100 see 

supplemented w/ KPi Lyse with NaOH and SDS footnotes 


(50 mM) 


* Recovery of all nucleotides (3P, 2P, 1P and nucleoside). 


Precipitate proteins with 
potassium acetate 


+ Calculated as a ratio of 3P composition (%) before and after the extraction. 
References 37, 38 are cited in this figure. Details of methods 3 and 4 can be found online (http://2013.igem.org/wiki/images/e/ed/BGU_purelink_quick_plasmid_qre.pdf). 
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CAREER GAPS 


Maternity muddle 


The support available for childbirth and rearing varies wildly. 
New parents come up with creative ways to juggle demands. 


BY AMANDA MASCARELLI 


arine scientist Maria Granberg gave 
Me to her first child in 2006 and 
took roughly a year off at 80% pay. 
But in 2010, when she became pregnant with 
her second child, she was at a more challeng- 
ing career stage. By then an assistant professor 
at the University of Gothenburg in Sweden, 
she was building momentum, planning 
research activities and establishing collabora- 
tions — and had yet to recruit staff for her lab 
to keep the research moving in her absence. 
In addition, her research on marine- 
sediment bacteria involved work with solvents 
that are unsafe to handle during pregnancy. 


Neither her department nor her granting body, 
the Swedish Research Council Formas, could 
offer her funding for a lab assistant. Between 
a pregnancy that restricted her work and six 
months of partial maternity leave, she lost a full 
season of field samples and a year of work. The 
losses resulted in missed publication opportu- 
nities and altered her long-term plans for the 
project. “That's avery stressful thing,” she says. 
“The pregnancy comes along and kind of dis- 
turbs your plan” 

Four years on, Granberg says that she is 
finally regaining traction, but she still views 
that period as troublesome to her career. “Sci- 
ence isa lot about timing, having momentum, 
and I think the scientific community could 
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do a lot about this period when you are preg- 
nant and you cannot be in the lab,” she says. 
For example, Granberg believes that research 
councils and granting agencies should estab- 
lish funding to support female researchers’ 
programmes during pregnancy and directly 
after their return from a maternity leave. 
“Efforts should be aimed at sustaining the 
research momentum at the same level as that 
of their colleagues,” she says. 

Her experience illustrates the challenges 
for women who are trying to juggle the 
obligations of the lab with the demands of 
motherhood. The decision to start a family 
often clashes with the time when research- 
ers are launching their careers and working 
towards tenure. Maternity-leave benefits and 
experiences vary widely among employers 
and at institutions across the globe, and even 
between departments. The amount of time 
a woman spends working during her leave 
depends on variables such as personal desires, 
career phase, the nature of the research and 
the institution's policies. 

Most researcher parents agree that there is 
no perfect time in a career for the arrival of a 
child. Already having tenure might offer job 
security, but it also comes with the respon- 
sibility of overseeing lab members and pro- 
jects. Having a child during the early days of 
a career means that there may not be major 
projects or personnel to oversee — but it can 
interrupt the momentum of the research. 

Early-career researchers who are consider- 
ing parenthood or who are preparing for a 
child need to be aware of the challenges that 
are likely to await them. Depending on their 
location and employer, some or all of their 
maternity leave may have to be unpaid, they 
may need to work from home at times, and 
they could well need to find ways to keep their 
research programme afloat. 

It is a tough road to navigate, especially for 
those who are just getting under way. Peo- 
ple who have travelled down that road warn 
that junior researchers who are considering 
parenthood should find out specifically and 
well in advance what benefits their employer 
offers and be ready to negotiate if they do not 
exist or do not meet their needs. 


AVENUES OF SUPPORT 

No matter how supportive the work environ- 
ment, breaking away from research can be 
fraught with difficult decisions, and there is 
no universal formula for how to satisfy the 
twin demands of work and family. Some > 
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> institutions and agencies offer support 
through a mixture of programmes and poli- 
cies — but with varying levels of success. 

The US National Science Foundation in 
Arlington, Virginia, for example, is starting to 
incorporate ‘family-friendly practices’ into its 
framework. Its Career—Life Balance Initiative 
was launched in 2011 and provides, among 
other offerings, supplemental funding of three 
months’ salary — about US$12,000 — to help 
principal investigators, postdocs and gradu- 
ate research fellows to pay for a technician 
during their maternity leave. And the Euro- 
pean Research Council has adapted its rules 
to allow principal investigators who become 
parents additional time for each child born to 
apply for specific grants. 


TIME TO NEGOTIATE 

Some institutions offer very little in the way of 
formal support. Gretchen Hansen was finish- 
ing her PhD in limnology and marine science 
at the University of Wisconsin-Madison when 
she became pregnant with her daughter. She 
told her adviser about her pregnancy and asked 
about the maternity policy for graduate stu- 
dents. “Nobody knew,’ she says, “and there was 
no policy.” 

She and her adviser then devised a plan 
that would allow her to drop down to a 
33% appointment during her leave, reduce 
her salary by one-third and keep her health 
insurance. Hansen spent about five months 
working from home when she could, and 
spending one day a week in the lab. 

She is now a fisheries research scientist 
at the Wisconsin Department of Natural 
Resources in Madison and is expecting her 
second child next month. She is not eligible 


Jenny Briggs occasionally took her daughter with 
her on fieldwork trips in the early months. 


for specific maternity benefits at her new 
institution, and can only claim the minimum 
12 weeks of unpaid leave offered by US law. 
She plans to use holiday time, sick leave and 
the ‘disability allowance’ for which employees 
become eligible when pregnant to cover most 
of her salary for about nine of those weeks. 
Samantha Joye, an oceanographer at the 
University of Georgia in Athens, slogged 
her way through a similarly knotty mater- 
nity experience. A tenured professor, she 
gave birth to her now-6-year-old daughter 
and worked mainly from home for about 8 
months, writing several papers and submit- 
ting two grant proposals before her daughter 


COPING TIPS 


How to maximize maternity benefits 


Researcher parents have come up with 

a host of ideas that have helped them to 
juggle the demands of work and family. 
Here are a few: 

@ If possible, hire an assistant or technician 
to help run lab or field duties during 
maternity leave. 

@ Sit down with your department head or 
chair several months in advance to discuss 
goals, identify possible support and plan out 
the return from maternity leave. 

@ Form a support network of other scientist 
mums and dads. Facebook groups are one 
way to build up this type of community. “It’s 
incredibly important,” says Jenny Briggs, 

an ecologist at the US Geological Survey in 
Denver, Colorado. “Other mothers who do 
this have been my greatest allies.” 

@ When trying to juggle work and parenting, 
figure out when you are most productive. 
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“You want to be strategic and realistic about 
what time of day you’re going to work on 
what,’ says Natalie Boelman, an Earth 
scientist at Columbia University in New York. 
During the early days after her daughter's 
birth, Boelman found that she needed 

to sleep during her daughter’s morning 

nap but could write efficiently when her 
daughter slept in the afternoon. 

@As much as possible, share parenting 
responsibilities 50-50 from day one, says 
Susana Martinez-Conde, a neuroscientist at 
the Barrow Neurological Institute in Phoenix, 
Arizona. She and her husband, Stephen 
Macknik, also a neuroscientist at Barrow, 
have shared parenting equally for all three 
of their children. When necessary, says 
Martinez-Conde, “we take turns, where one 
person may be writing a paper or grant and 
the other takes the kids to the park”. A.M. 


© 2014 Macmillan Publishers Limited. All rights reserved 


was 6 months old. Because the university did 
not provide any paid maternity leave, she used 
sick leave to get partial pay during that time. 
In 2012, when her twins were born, she did 
much the same thing. She and her husband, 
Christof Meile, a biogeochemical modeller at 
the same university, juggled their parenting 
around each other's tenure demands. 

In the early days, most parents have not 
yet organized formal child-care services, 
which presents demands when trying to keep 
a research programme running. When Joye 
started venturing back onto campus three 
months after the birth of her daughter, she 
brought her child with her when meeting stu- 
dents, attending seminars and joining lab or 
faculty meetings. Most of her colleagues and 
university staff were supportive of the pres- 
ence of her child, she says. 

Mara Dierssen, a neuroscientist at the Center 
for Genomic Regulation in Barcelona, Spain, 
took about four months of leave after each of 
her four children were born, but found crea- 
tive ways to stay in touch with lab members, 
including walks through the park with her PhD 
students, with her child in a pushchair. 

Hopi Hoekstra, an evolutionary geneticist 
at Harvard University in Cambridge, Massa- 
chusetts, took three months off after her son 
was born in 2012. But about a month in, she 
began bringing the baby into the lab, enter- 
taining him in a baby seat and sometimes 
rocking him to sleep during lab meetings. 
Her husband, also an evolutionary geneticist 
at Harvard, took their son into his office as 
well, where he kept baby gear. In retrospect, 
she adds, “I wish I had been more off-line. The 
team did just fine on its own?” 

Trying to include lab time and face time 
with graduate students and other lab mem- 
bers during parental leave is difficult enough, 
but many researchers also have to contend 
with the demands of fieldwork. Managing 
that schedule takes some nimble and creative 
planning, they say (see ‘Coping tips’). Jenny 
Briggs, an ecologist at the US Geological Sur- 
vey in Denver, Colorado, was working on a 
project in the Rocky Mountains when her 
second child was due in 2010. “In my work, 
biological changes happen season by season,” 
she says. “You can’t put annual studies on hold 
for a year.” So she arranged with her boss to 
funnel the money that she would have been 
earning had she not taken unpaid leave to 
a graduate student who could take over the 
study. She returned to fieldwork just weeks 
after her daughter was born, sometimes bring- 
ing her infant to breastfeed and her mother to 
assist. Usually, she left the baby at home with 
a carer and took breaks behind trees or in the 
field truck to pump breast milk. 


TOP NOTCH 

Some universities offer exceptional maternity 
and paternity benefits. Many academic insti- 
tutions in Scandinavia offer both parents up to 
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a year of paid leave, for example, and under 
UK national policy, institutions must pro- 
vide statutory maternity pay — a percent- 
age of average weekly earnings — for up 
to 39 weeks and leave of up to a year. The 
United States has a few standout examples. 
At Northwestern University in Evanston, 
Illinois, many departments offer mothers 
three months of full pay for childbear- 
ing and adoption leave and another three 
months for child rearing. “I never felt pres- 
sure to be in the office during maternity 
leave, says Yarrow Axford, a geologist 
who had a son last year. “Of course, I had 
to keep up with 
what was hap- 
pening in my lab 
and with students 
conducting inde- 
pendent research. 
But that is just part 
of doing the busi- 
ness of science.” 
Northwestern also 
grants a one-year 
extension on the 
tenure clock to 


“The scientific mothers after 
community the birth of their 
could do alot child and to both 
about this period _ parents following 
when you are an adoption. Fur- 
pregnant and thermore, it offers 
youcannotbein three months of 
the lab.” paternity leave 
Maria Granberg — so Axford’s 


husband, Christo- 
pher Kuzawa, a biological anthropologist at 
Northwestern, also stayed home for three 
months after their son's birth. “Those early 
months are so formative,” says Axford. “If 
more dads could take substantial time away 
from work to be home with their children 
in the early months, I suspect there would 
be more dads out there who are 100% com- 
fortable being left alone with their kids, 
and moms comfortable leaving them?” 
Kuzawa notes that the time at home 
allowed him and Axford to jointly figure 
out the intricacies of caring for their son. 
“We established all the patterns together,” 
says Kuzawa. “So it never felt like one of 
us was the primary caregiver and the other 
was kind of in the passenger seat.” Axford 
heads out to Greenland this summer for 
three weeks of fieldwork, but she and her 
husband feel somewhat less anxious about 
her impending absence. “We were faced 
with everything that comes with being a 
parent — all that middle-of-the-night stuff, 
weird naps, crying for no reason, wont eat,” 
Kuzawa says. “We had to come up with 
solutions to all of that. And we did? m= 


Amanda Mascarelli is a freelance writer 
in Denver, Colorado. 


TURNING POINT 


CAREERS 


Collin Diedrich 


Collin Diedrich has overcome learning 
disabilities to carve out a promising career 
researching HIV and tuberculosis (TB) 
co-infection. He talks about what prompted his 
move from the United States to the University 
of Cape Town in South Africa. 


What challenges did your disabilities present? 
I was diagnosed with reading and learning dis- 
abilities in primary school in St Louis, Missouri. 
But my parents got me private tutors right away, 
who helped me develop strategies to improve 
my reading comprehension and organizational 
and memory skills. Given the stigma and feel- 
ings of inadequacy that can come with a learn- 
ing disability, I have struggled with impostor 
complex, cycling through phases where I feel 
completely out of place and inadequate, as if it 
is only a matter of time until I am ‘found out. 
Luckily, Iam driven and have an amazing sup- 
port network in my advisers, colleagues, my 
wife and my family, who have all been helpful 
and patient when my mind is racing. 


What drew you to a career in science? 

I didn't really think of science as a career 
option until I took a biology course at univer- 
sity and realized how much I liked the idea of 
becoming a researcher. The more I learned 
about HIV, the more fascinated I was by this 
virus that can attack your immune system. I 
read books about it, but what intrigued me 
most were the first-hand accounts of peo- 
ple with HIV. Before I read them, I couldn't 
understand why anyone would engage in the 
risky behaviours that could lead to HIV infec- 
tion, but I came to understand that the threat 
of death a decade or more later was not often 
an immediate concern, especially among those 
with already-risky lifestyles. 


Did you discuss your disabilities with any 
potential advisers? 

When I started my PhD at the University of 
Pittsburgh, I didn’t want to tell people about 
my learning disorder — I was nervous and 
intimidated about mentioning it. Then I met 
JoAnne Flynn, who was head of the molecular 
virology and microbiology department, and 
working on simian immunodeficiency virus 
(SIV) and TB co-infection. Our conversation 
went so well, I felt comfortable telling her. 
Without missing a beat, she directed me to the 
university's learning-disabilities centre. After 
spending time in her lab, I decided to do my 
dissertation with her, focusing on the cascade 
of immunological responses that follows SIV 
infection. She expected as much from meas 
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from anyone else, and was approachable and 
helpful — which helped me both scientifically 
and emotionally. 


Why go to South Africa? 

I wanted to work on co-infection in human 
samples, which are in greater supply in South 
Africa than in the United States. I knew 
that Robert Wilkinson was working on co- 
infection at the University of Cape Town, so I 
secured funding to do my postdoc there. 


What did you find hardest about the move? 
When you start a postdoc in a new lab, you 
have to learn where things are and new tech- 
niques. I had cultural differences and practi- 
cal concerns to figure out, and I also had to 
determine how to get access to the samples 
necessary to research how HIV alters the 
granuloma, the inflamed tissue. 


How did it change your perspective? 

Seeing the effects of these devastating diseases 
first-hand has been a powerful experience, 
even though I am doing basic research. I have 
also learned that just because US scientists do 
something one way doesn’t mean it is the only 
way. I’ve replaced my US-centric views with a 
broader appreciation of research approaches. 


What do you plan to do now? 

I hope to continue working here throughout 
my career, at least half-time. I have funding 
until the end of 2015, and have developed 
strong collaborations. I would also like to be 
an advocate for students with learning disabil- 
ities and help university admissions officers 
find ways to look beyond examination scores 
to include candidates, such as myself, with 
research aspirations. = 
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broken wing. He sat across the kitchen 

table from Amon with one arm crooked 
in a scarf bandage — a ruse that, the man 
insisted, made it easier for him to pass the 
sentinels with his crisp greenback bribes. 

“Boring.” Amon blinked, shoulders 
slumping with disappointment. He tugged 
the electrodes from his temples, and they 
retracted on threads spooling from Skinner’s 
open valise. “You promised better.” 

“They're hard to come by, Sir” 

Amon narrowed his hazel eyes. “But 
youre easily replaceable” 

Skinner avoided the young Citizen's gaze. 
“I may have something you'll like” He busied 
himself with a set of electrodes wired to a sep- 
arate receptacle inside the black leather bag. 

“At additional cost, no doubt.” 

“Of course.” Skinner cringed slightly and 
bared his yellow teeth in an attempt to lighten 
the mood. “Got my living to make, same as 
anybody else. But I can offer you a discount 
this evening — half off your second purchase.” 

“Show me what you've got.” Amon ges- 
tured limply for the smuggler to speed things 
up. “No more murder and mayhem. I want 
more life.” 

Electrodes in hand, Skinner reached for 
the gentleman’s forehead. 

Amon leaned in. “Ts it good?” 

“Good, bad — what's the difference?” 
Skinner forced a chuckle. 

Amon grabbed Skinner's gaunt wrist, elec- 
trodes dangling in midair. “One word from 
me, old man, and you will never leave the 
City. Answer me straight.” 

“What I mean, Sir, is that the memories 
themselves are often amoral in nature. A 
murder, for example. Taken from the mind 
of the victim’s mother, it’s the worst possible 
memory. But from the mind of the killer —” 

“Tve had quite enough death” 

“Understood.” 

Master Amon had required Skinner’s ser- 
vices on multiple occasions over the past few 
years, ever since the Wall had gone up around 
the City and President Hoover had mandated 
that all Citizens remain inside for their own 
protection. The Hoovervilles outside — as 
the rabble referred to them — were no place 
for men of means. But as everyday life behind 
the Wall stagnated, the wealthy had grown 
desperate for new sources of entertainment. 

“Tm sure you will find this one to be a fine 
diversion from the norm. If you please.” 


Oi looked like an old vulture witha 
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It’s allin the mind. 


Amon released him. “How'd you come 
by it?” 

“Do you really want to know?” Skinner 
attached the electrodes to Amon’s temples. 

Amon held his gaze. 

“Very well.” Skinner cleared his throat. 
“Some of the commoners have exhausted 
their supply of bad memories, so to speak. 
So they have had to resort to releasing good 
ones in light of their... current hardships.” 

“Winter.” 

“When it means the difference between 
buying a warm coat and retaining a lingering 
memory of Grandma on Christmas morn- 
ing —” Skinner noted the Citizen's look of 
disgust. “Not to worry. Considering how well 
you pay me, there will be no grandmotherly 
memories for you tonight.” 

Electrodes in place, Amon leaned back in 
his chair. “Glad to hear it?” 

Skinner reached into his bag. “Deep breaths 
now, Master Amon. You know the drill.” 

The Citizen shut his eyes and inhaled 
through his nostrils, waiting for the visions 
of another life to play through his mind like 
a motion picture at the theatre — but with 
full colour and sound. 


The sentinel on duty frowned as he halted 
the Chrysler Six Roadster at the South Wall. 
Something about the driver didn’t look quite 
right. But as the stoop-shouldered fellow at 
the wheel reached into his sling, the guard 
relaxed. When good money changed hands, 
few questions tended to surface. 

“Quite a haul tonight.” He pocketed the 


cash. 
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Shrugging, the sentinel raised the gate and 
waved the car through. 


Gunning the engine, Amon glanced into the 
driver’s side mirror as he tossed his hat onto 
the vacant passenger seat. He tugged off the 
makeshift sling and stretched his arm, reach- 
ing back for the bound and gagged man hid- 
den under a throw rug. 

“Time to wakey-wakey, Sacagawea.” He 
punched Skinner in the gut and yanked 
the gag down around his neck, steering 
one-handed towards the shanties and tents 
that spread far beyond the reach of the car’s 
headlights. 

“Have you lost your mind?” Skinner sput- 
tered. 

“Td be lost without you.” As the Wall 
shrank behind them, Amon released a 
whoop. “I feel so alive!” 

“What you're feeling now — it’s not real. It 
wasn't your memory —” 

“Tt’s mine now. And I'm in love! You hear 
me?” He howled up at the Moon. “I’ve got 
to find her!” 

“She won't know you from Madame But- 
terfly, Sir. And when the G-men find out 
you've flown the coop —” 

Amon howled again. 

Skinner sighed. In these harsh economic 
times, it was best not to give people false 
hope. Yet that’s exactly what this Citizen was 
riding high on at the moment. 

But he would crash back to reality in 
time. It was inevitable that he would return 
to his walled city after the throes of true 
love faded and dropped him into the depths 
of despair. Amon would suffer a low so dev- 
astating he could never hope to recover on 
his own. 

And Skinner would be right there with his 
valise, when all Amon would demand were 
bad memories again, the variety that never 
allowed anyone to soar the heavens only to 
collapse into hell’s yawning abyss. 

Skinner cleared his throat. “Do you even 
know her name, Sir?” 

Amon scoffed at such a foolish question. 
He and the woman were soul mates now, 
after all. “Florence? he swooned. 

Skinner almost smiled. “Well, that’s a 
start.” m 
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